Software Engineering

    Enterprise API Design: REST vs GraphQL vs gRPC, Versioning and KVKK

    Seven practical decisions for production-grade enterprise API design: protocol choice (REST/GraphQL/gRPC), OpenAPI contract, versioning strategy, rate limiting, audit logs, KVKK-compliant personal data flow. Lessons from enterprise integration projects. An eCloud Tech engineering note.

    Published: May 25, 202614 min read
    api-designrestgraphqlgrpc

    APIs are no longer just data bridges between applications; in modern enterprise architecture they are the living contract of business processes. Customer portals, mobile apps, B2B integrations, microservice meshes, AI agents — all communicate via APIs. A badly designed API produces a migration project in 2-3 years; a well-designed API lasts 5-10 years and adapts to new use cases. Enterprise API engineering requires deliberate decisions at every layer — from protocol selection to versioning discipline, from rate limiting to audit logs.

    Within our API integration engineering and SaaS platform engineering services we have delivered 19 enterprise API design/migration projects over the last 30 months — in finance, e-commerce, healthcare and logistics. The public and internal APIs of our own AIGENCY v4 platform also operate under this discipline. In this article we walk through seven critical decisions for enterprise API design in order: protocol choice, contract design, versioning, rate limiting, authentication, audit logs + KVKK flow, and monitoring + observability.

    1. Protocol selection — REST, GraphQL, gRPC decision matrix

    API protocol selection is the skeletal decision — hard to migrate, ecosystem-defining, team-skill-shaping. Three main options (plus niches):

    REST (Representational State Transfer) — resource-oriented API on HTTP semantics. Most mature ecosystem. Strengths: cache (HTTP cache, CDN), debug (curl, Postman), documentation (OpenAPI 3.x is the industry standard), tooling (Insomnia, Swagger UI, Stoplight). Weaknesses: over-fetching (server returns fields the client doesn't use), under-fetching (N+1 requests for a single screen), versioning is complicated.

    GraphQL — released by Facebook in 2015, a schema-first query language. Clients ask for exactly the fields they need and the server returns just that. Strengths: data minimisation (a KVKK advantage), client-team productivity (new views without backend changes), introspection + tooling (Apollo Studio, GraphiQL). Weaknesses: caching is hard (POST request, URL-based CDN cache doesn't work, Apollo Cache or Persisted Queries needed), N+1 problem (with naive resolvers), monitoring is complex (each query has its own shape), rate limiting is non-trivial (depth limit, complexity limit as a separate layer).

    gRPC — Google 2016, HTTP/2 + Protocol Buffers (Protobuf) binary serialisation. Optimised for service-to-service traffic. Strengths: high throughput + low latency (binary, multiplexing), bidirectional streaming native, strong typing (Protobuf IDL), polyglot (code generation in 40+ languages). Weaknesses: not browser-native (requires gRPC-Web proxy), hard to debug (binary), rare in public APIs (third-party developers are not used to it).

    Decision matrix:

    ScenarioRecommendedWhy
    Public REST API (third-party developers)REST + OpenAPIEcosystem, docs, OAuth, Postman
    Mobile app — backend (high over-fetch risk)GraphQLClient fetches its own fields, mobile data savings
    Web SPA — backend (simple CRUD)RESTCache, debug, low learning curve
    Microservice-microservice (high volume)gRPCPerformance, typing, streaming
    Browser → backend (modern web)REST or tRPCgRPC not native
    AI agent — tool ecosystemREST + OpenAPILLM function calling ecosystem tuned to REST
    Bank core ↔ channel integrationgRPC + mTLSInternal, performance + security
    B2B enterprise integrationREST + OpenAPICommon standard, SDK generation

    Hybrid pattern: public REST gateway + internal gRPC. The customer sees REST; the backend microservice mesh speaks gRPC. On our AIGENCY platform the public surface is REST with token-based authentication (docs: aigency.dev/docs); the choice is reinforced by the LLM/agent function-calling ecosystem being tuned for REST.

    Common mistake: picking GraphQL because it's trending. GraphQL is enormous in the right scenario, unnecessary complexity in the wrong one (simple CRUD + 2 page app). The decision should be driven by technical fit, not popularity.

    2. Contract design — OpenAPI 3.x, contract-first vs code-first

    An API's contract is the structured spec that defines endpoint URLs + HTTP methods + request shape + response shape + status codes + auth requirements + error format. OpenAPI 3.x (formerly Swagger) is the industry standard for REST; SDL (Schema Definition Language) is mandatory for GraphQL; Protobuf .proto files are mandatory for gRPC.

    Contract-first:

    1. The designer writes OpenAPI YAML/JSON (openapi.yaml).
    2. Stakeholder review — frontend, mobile and third-party developers review.
    3. A mock server (Prism, Mockoon, Stoplight) lets the frontend work without the backend.
    4. Backend implementation is written against the contract.
    5. Automated tests (Schemathesis, Dredd) are generated from the contract and catch drift in CI.

    Code-first:

    1. The backend developer writes endpoint code (decorator + type annotation).
    2. The framework (FastAPI, NestJS, tRPC) auto-generates OpenAPI/SDL.
    3. The frontend generates a client once the backend is ready.

    Which is right? From our experience:

    ScenarioApproachReason
    Public/external APIContract-firstStakeholder review is critical, contract is commitment, breaking-change discipline
    B2B partner integrationContract-firstContract = legal commitment, agree with partner upfront
    Internal microservice (single team)Code-firstFast iterate, framework automates
    Quick PoC / hackathonCode-firstContract writing slows you down
    Multi-language team (TS + Python + Go)Contract-firstSingle source of truth for polyglot
    Single-language team (TypeScript only)tRPC or code-firstEnd-to-end type safety, generate unnecessary

    Discipline for the OpenAPI doc to be actually used:

    • openapi-validator runs in the CI pipeline — reject pushes with invalid YAML.
    • openapi.yaml diff is a mandatory review item in every PR (catches breaking changes).
    • The openapi-diff tool compares old vs new schema and auto-suggests semantic version bumps.
    • Spectral lint rules (field names camelCase, error response mandatory, security scheme defined, etc).
    • Auto-generated SDKs (TypeScript, Python, Go) publish to npm/PyPI/GitHub Releases on every release.
    • Public documentation via Swagger UI or Redoc — docs.your-api.com subdomain.

    Our AI governance framework details the mandatory content of API contracts from an audit and KVKK perspective (security schemes, pii_field tags, data_retention headers).

    3. Versioning strategy — URL, header, lifecycle policy

    Versioning is the single critical factor for the sustainability of a public API. The wrong strategy either annoys customers (constant breaking changes) or rots the backend (the old version never dies and the codebase bloats).

    URL versioning (/v1/users, /v2/users):

    • Pro: easy to debug, works directly in Postman/curl, cache-friendly (URL = key), clients hard-code the URL.
    • Con: same resource lives in two places, migration is manual.
    • Our recommendation: default for public REST APIs.

    Header versioning (Accept: application/vnd.company.v2+json):

    • Pro: clean URLs, RESTful-purist friendly.
    • Con: hard to debug (manual curl headers), CDN cache keys must include the header (Vary header), client mistakes are easy.
    • Our recommendation: only for internal APIs or niche cases with very symmetric versioning.

    Query parameter (/users?version=2):

    • OK for fast prototypes; in production it muddies cache keys and analytics.
    • Not recommended.

    Version lifecycle policy (committed in writing):

    PhaseDurationAction
    ActiveIndefiniteNew features, bug fixes, breaking changes → MINOR bump
    Deprecated12 monthsDeprecation + Sunset HTTP headers, warning in docs
    Sunset announcement3 months beforeEmail + dashboard banner to all customers
    SunsetOn dateEndpoint returns HTTP 410 Gone

    Breaking-change rules (MUST):

    ChangeTypeAction
    Add new field (response)Non-breakingMINOR bump
    Add new endpointNon-breakingMINOR bump
    Change type of existing fieldBreakingMAJOR bump, new version path
    Remove existing fieldBreakingMAJOR bump
    Add required field (request)BreakingMAJOR bump
    Add optional field (request)Non-breakingMINOR bump
    Change error codeBreakingMAJOR bump
    Change authenticationBreakingMAJOR bump + separate announcement

    Major versions ship every two years + only in extreme cases. An API that ships a new major version every month is untrustworthy — customers won't commit.

    4. Rate limiting + throttling — fair use + DDoS protection

    Rate limiting protects good citizens and stops bad ones. Three-layer strategy:

    Layer 1 — Global / DDoS protection: at the CDN/WAF layer (Cloudflare, AWS Shield, Akamai). Broad threshold like 10,000+ req/s, IP-based. Purpose: stop volumetric attacks.

    Layer 2 — Per-API-key (fair use): by customer tier:

    TierRate limitBurstMonthly quota
    Free60 req/min10 req100K requests
    Pro1,000 req/min50 req1M requests
    Enterprise10,000+ req/min200 reqUnlimited (SLA-bound)

    Sliding window is smoother than fixed window. Redis-based counter:

    INCR rate:user:{api_key}:{current_minute}
    EXPIRE rate:user:{api_key}:{current_minute} 60
    GET rate:user:{api_key}:{current_minute}
    

    A token bucket algorithm provides burst capacity (allow an average of 100 req/min but a momentary 50-req burst).

    Layer 3 — Per-endpoint limit: separate limits for expensive endpoints. Example:

    EndpointCostLimit
    GET /users/{id}Low (DB read)General limit
    POST /pdf/generateHigh (PDF render 2s)10 req/min
    POST /ai/inferenceVery high (LLM)5 req/min
    POST /export/csvHigh (DB scan)3 req/min

    HTTP response is mandatory:

    • When the limit is exceeded: 429 Too Many Requests
    • Headers:
      • X-RateLimit-Limit: 1000
      • X-RateLimit-Remaining: 0
      • X-RateLimit-Reset: 1716638400 (epoch)
      • Retry-After: 30 (seconds)

    A customer usage dashboard is mandatory — users must see their own quota, get approach-warnings (email / webhook), monthly reports.

    KVKK perspective: an additional layer for brute-force protection on the POST /login endpoint — exponential backoff per failed attempt (1s, 2s, 4s, 8s...), 15-minute lockout after 10 attempts. CAPTCHA + MFA challenge integration.

    5. Authentication & authorisation — OAuth 2.1, JWT, mTLS

    Authentication = who are you?; Authorisation = what can you do?. Four main patterns in enterprise APIs:

    API key (simple):

    • X-API-Key: ak_live_xxxxx header.
    • Server-to-server, not short-lived, weak scope support.
    • Very common in public APIs (Stripe, OpenAI); can be enough internally.
    • Rotation discipline is critical — revoke immediately on leak and issue a new key.

    OAuth 2.1 + JWT (modern standard):

    • Authorisation server (Keycloak, Auth0, Okta, Cognito, Authentik) with flows.
    • Authorisation Code Flow + PKCE (mobile + SPA), Client Credentials (server-to-server).
    • Access token (short-lived, 15min-1hr) + Refresh token (long-lived, 7-30 days).
    • JWT contains sub, scope, exp, custom claims. Don't use symmetric HS256; use asymmetric RS256/ES256 (easier key rotation).
    • Dominant choice for public and B2B enterprise APIs.

    mTLS (mutual TLS):

    • Both sides (client + server) present certificates.
    • For bank core ↔ channel, healthcare system integration, critical infrastructure.
    • mTLS + OAuth combined (defence in depth) in high-security environments.

    RBAC + ABAC + Scope:

    ModelDefinitionExample
    RBAC (Role-Based)User → role → permissions"admin", "editor", "viewer"
    ABAC (Attribute-Based)Attribute (department, region, time) checks"Marketing dept + Istanbul office + 09-18 hours"
    Scope (OAuth)Permission scope on the tokenusers:read, orders:write
    ReBAC (Relationship-Based)Relationship graph"this document's author can edit"

    Practical recommendation: JWT scopes (coarse-grained) + database-level RBAC (fine-grained). Broad scopes like read:users, write:users in JWT; details like this user_id can only see rows of their own org at the database layer.

    Token storage — browser localStorage is vulnerable to XSS; httpOnly secure cookies are preferred. Mobile: Keychain (iOS) / Keystore (Android).

    KVKK practical: every API endpoint that touches personal data must require a scope — a client without read:user_profile scope cannot reach personal data, the request must be rejected (403 Forbidden).

    6. Audit logs + KVKK-compliant personal data flow

    The most frequently skipped step in production APIs is audit logging. In a KVKK breach investigation, BDDK audit or SOC 2 compliance, this is the only defence — documentation of who touched which PII when.

    Minimum audit log fields:

    FieldExampleWhy
    timestamp2026-05-25T14:32:17ZTemporal order
    request_iduuid v4Distributed trace
    api_key_idak_live_xxxxxWho (technical identity)
    user_idu_98234Who (business identity)
    client_ip203.0.113.42From where
    user_agentMozilla/5.0...Which tool
    http_methodGETAction type
    endpoint/v2/users/{id}Resource
    query_params?fields=name,emailDetail
    response_status200Outcome
    response_time_ms142Performance
    pii_fields_accessed[name, email, phone]KVKK critical
    auth_scoperead:usersAuthorisation

    Log storage: immutable, append-only — Elasticsearch + ILM (Index Lifecycle Management), 90 days hot + 2 years warm + 5 years cold (for KVKK retention compliance). Cloud: AWS CloudWatch Logs + S3 Glacier, Azure Log Analytics + Blob archive. Access is restricted to the compliance team (RBAC + MFA).

    Field-level masking pattern (at the API gateway):

    Original response:
    {
      "user_id": "u_98234",
      "name": "Mehmet Yılmaz",
      "tc_kimlik": "12345678901",
      "iban": "TR33 0006 1005 1978 6457 8413 26",
      "phone": "+90 532 123 45 67"
    }
    
    Customer-service role:
    {
      "user_id": "u_98234",
      "name": "Mehmet Yılmaz",
      "tc_kimlik": "***********",
      "iban": "TR33 **** **** **** **** **** 26",
      "phone": "+90 532 *** ** **"
    }
    
    Finance role:
    {
      "user_id": "u_98234",
      "name": "Mehmet Yılmaz",
      "tc_kimlik": "12345678901",
      "iban": "TR33 0006 1005 1978 6457 8413 26",
      "phone": "+90 532 *** ** **"
    }
    

    Sparse fieldset (?fields=... in REST, native in GraphQL) — clients fetch only the fields they need, no unnecessary PII transits. The technical expression of KVKK's data-minimisation principle.

    Right to erasure (KVKK Article 11) — when an erasure request arrives:

    1. Identity verification (is the person really this user?).
    2. API endpoint: DELETE /users/{id} or internal admin tool.
    3. Hard delete (not soft delete — not just a deleted_at column, actually delete).
    4. Cascade delete (related orders, sessions, PII fields in audit logs).
    5. Audit-log note of the deletion event (who deleted, when — not the identity of the deleted person, only the deletion event).
    6. Must be completed within 30 days.

    Cross-border transfer: log the country of the API client; PII transfers outside the EU require Standard Contractual Clauses added to the contract + Turkish translation.

    7. Monitoring + observability — not just running, but healthy

    A production API needs to be more than running; it must be healthy. That requires three layers of observability:

    Metrics (numerical indicators):

    MetricTargetMeaning
    Request rateTrack trendsVolume
    Error rate (4xx + 5xx)<1% 5xx, <5% 4xxHealth
    Latency p50<100msFast path
    Latency p95<500msMost users
    Latency p99<2sWorst case
    Throughput (req/s)Capacity planningScaling signal
    Active API keysCustomer adoptionBusiness metric
    Rate limit hit rate<2%Tier sized right?

    Prometheus + Grafana (open source), Datadog/New Relic (managed), AWS CloudWatch/Azure Monitor (cloud-native).

    Tracing (distributed): if a request travels through 5 microservices, follow it end-to-end via a trace ID. OpenTelemetry is the standard; Jaeger/Tempo/AWS X-Ray as backends. Critical for finding latency bottlenecks (which service is slow?).

    Logging (structured): JSON format, log level (DEBUG, INFO, WARN, ERROR), context (request_id, user_id, trace_id). Elasticsearch + Kibana, Loki + Grafana, Splunk. Audit log on a separate pipeline (immutable, compliance).

    SLO + error budget:

    • SLO: 99.9% availability (43 min/month downtime acceptable).
    • 99.95% (22 min/month) — financial/health tier.
    • 99.99% (4 min/month) — critical infrastructure tier (requires multi-region active-active).
    • Error budget exhausted → feature freeze, reliability work in focus.

    Alerting (PagerDuty, Opsgenie, Grafana OnCall):

    • P1: 5xx rate >5% (5 min sustained) → on-call SMS + phone.
    • P2: latency p99 >5s (10 min sustained) → on-call SMS.
    • P3: rate limit hit rate >10% sustained → email.
    • P4: daily summary → Slack channel.

    Status page (Statuspage.io, Instatus, self-hosted): transparency to the user. Incident → public update → resolution.

    Data engineering infrastructure is often critical for the API metric + log pipeline — Kafka, ETL, time-series database stack.

    Practical summary — starting checklist

    The correct order for your first production API:

    1. Protocol selection: public = REST, mobile-heavy = GraphQL, internal microservice = gRPC. Hybrid OK.
    2. Contract design: contract-first (public/B2B), code-first (internal solo team). Validate OpenAPI 3.x in CI.
    3. Versioning: URL versioning by default, 12-month deprecated period, semantic version discipline.
    4. Auth: OAuth 2.1 + JWT (RS256), scope-based authorisation, mTLS for high sensitivity.
    5. Rate limiting: 3 layers (DDoS + per-key + per-endpoint), sliding window + Redis, mandatory response headers.
    6. Audit logs: timestamp + identity + endpoint + PII fields accessed. Immutable storage, 5+ year retention.
    7. KVKK: scope + masking + sparse fieldset + hard-delete pipeline. Cross-border transfer logs.
    8. OpenAPI docs: contract-first, auto-generated SDKs, public Swagger UI.
    9. Monitoring: three layers (metrics + traces + logs), SLO + error budget, alerting + status page.
    10. Continuous improvement: client feedback, error analytics, deprecation roadmap, annual security review (pentest).

    This list is minimum discipline. On top come sector-specific additions (PSD2 OAuth flow, FHIR/HL7 healthcare standards, ISO 27001 access control matrix). The value of an enterprise API is not in working today but in still being sustainable, documented and auditable three years from now.

    Our team in Şanlıurfa Karaköprü has delivered 19 enterprise API projects in finance, e-commerce, healthcare and logistics through API integration engineering and SaaS platform engineering, and operates our AIGENCY v4 platform API architecture on the same discipline. For enterprise API design, migration or a maturity assessment of your existing API, you can reach us through the contact form — the first assessment call is free of charge.


    eCloud Tech — A team based in Şanlıurfa, Türkiye, working on enterprise software, AI, blockchain and cybersecurity. Building Tomorrow.

    Frequently Asked Questions

    The decision rests on three variables. (1) Client type: if web/mobile have different data needs use GraphQL (each client fetches only its fields, no over-fetching); if one contract fits everyone use REST (mature cache, debug and documentation ecosystems). For service-to-service internal traffic that needs low latency and high throughput use gRPC (HTTP/2, Protobuf, native streaming). (2) Developer ecosystem: every team knows REST and needs no training. GraphQL has a 2-4 week learning curve with Apollo/Relay and benefits from senior engineers. gRPC requires the team to embrace Protobuf + IDL discipline — a natural fit for backend-heavy teams. (3) External vs internal: APIs exposed to outside customers should usually be REST + OpenAPI (third-party tooling, Postman, OAuth ecosystem); for in-house microservice networks gRPC is dominant. A hybrid pattern is also common: public REST + internal gRPC (a REST gateway proxies into the gRPC backend). Don't fall for the GraphQL is good for everything myth; for simple CRUD, REST is still the right call.

    Related articles