GraphQL is extremely attractive for commerce platforms. It gives frontend teams flexibility, reduces over-fetching, and allows rapid feature iteration. But that flexibility comes with architectural responsibility.
Without discipline, a GraphQL backend can become the primary bottleneck in your system. Query flexibility can translate into unpredictable database load, cache pressure, and rising infrastructure costs.
In this post, I will walk through practical strategies for designing a scalable GraphQL backend in Go, focusing on batching, caching, schema evolution, and architectural boundaries.
Architecture Boundaries First
The most important design rule is simple:
GraphQL is a transport layer, not a business logic layer.
Resolvers should orchestrate domain services. They should not contain core business rules, persistence logic, or infrastructure concerns.
A clean Go structure typically looks like this:
- graph: schema and resolvers
- domain: business logic
- repository: database access
- infrastructure: cache, messaging, external integrations
This separation ensures:
- Business logic is independently testable.
- Transport concerns do not leak into core services.
- The system can evolve without coupling everything to the GraphQL schema.
When GraphQL becomes the center of your architecture instead of an interface, scalability problems follow.
Eliminating the N+1 Problem
The most common scalability issue in GraphQL systems is the N+1 query problem.
Imagine querying 200 products and resolving inventory and pricing for each product individually. A naive resolver might trigger a separate database query per product.
Instead of executing:
SELECT * FROM inventory WHERE product_id = ?
200 times, we batch.
In Go, a request-scoped DataLoader pattern works well.
Example loader:
type InventoryLoader struct {
repo *repository.InventoryRepo
}
func (l *InventoryLoader) LoadMany(ctx context.Context, productIDs []int) (map[int]Inventory, error) {
return l.repo.GetByProductIDs(ctx, productIDs)
}
Repository implementation:
func (r *InventoryRepo) GetByProductIDs(ctx context.Context, ids []int) (map[int]Inventory, error) {
query := `SELECT product_id, quantity FROM inventory WHERE product_id = ANY($1)`
rows, err := r.db.QueryContext(ctx, query, pq.Array(ids))
if err != nil {
return nil, err
}
defer rows.Close()
result := make(map[int]Inventory)
for rows.Next() {
var inv Inventory
if err := rows.Scan(&inv.ProductID, &inv.Quantity); err != nil {
return nil, err
}
result[inv.ProductID] = inv
}
return result, nil
}
This reduces database round trips dramatically and stabilizes latency under load.
Batching is not optional in a high-traffic commerce backend. It is foundational.
Caching Strategy
In commerce systems, reads dominate writes.
A layered caching strategy typically includes:
- Request-level in-memory caching
- Redis for cross-request caching
- CDN for public assets
The critical part is defining cache boundaries explicitly.
Cache keys must reflect business semantics. For example:
- product:{id}
- category:{id}:products
- pricing:{region}:{product_id}
Invalidation must be designed before caching is implemented. Product updates must invalidate both product-level and category-level entries. Otherwise, you trade performance for consistency bugs.
Blind caching increases system fragility. Structured caching increases system predictability.
Schema Evolution Without Breaking Clients
GraphQL promises flexibility, but schema evolution is still a real operational concern.
In a fast-moving commerce environment, frontend and backend deployments are not always synchronized. Breaking changes in the schema can disrupt multiple teams.
Our approach was explicit versioning at the field level.
When introducing a new shape or behavior, instead of modifying an existing field, we introduced:
- productPrice_v2
- productPrice_v3
The older field was marked as deprecated:
type Product {
productPrice: Money @deprecated(reason: "Use productPrice_v2")
productPrice_v2: Money
}
We then observed usage of deprecated fields through schema analytics and resolver-level logging. Only after confirming that no active clients depended on the deprecated field did we remove it.
This approach provided:
- Backward compatibility
- Safe iteration
- Clear migration paths
- No surprise breakages
It does increase schema verbosity temporarily. But in a commerce platform where uptime and stability directly affect revenue, predictability outweighs aesthetic purity.
Concurrency and Context Discipline
Go makes concurrency simple. That does not mean it is safe by default.
Every resolver must respect context cancellation and timeouts.
Example:
func (r *Resolver) Product(ctx context.Context, id int) (*Product, error) {
select {
case <-ctx.Done():
return nil, ctx.Err()
default:
return r.productService.GetByID(ctx, id)
}
}
Downstream calls must propagate context properly:
func (r *ProductRepo) GetByID(ctx context.Context, id int) (*Product, error) {
return r.db.QueryContext(ctx, "SELECT ... WHERE id = $1", id)
}
Ignoring context propagation leads to:
- Wasted database queries
- Goroutine leaks
- Increased CPU under partial failures
Under load, these inefficiencies accumulate quickly.
Query Cost Control
GraphQL allows deeply nested queries. Without safeguards, a single request can become computationally expensive.
Defensive measures include:
- Maximum query depth limits
- Complexity scoring per field
- Mandatory pagination for lists
- Resolver-level timeouts
These constraints protect the database and cache layers from accidental or malicious misuse.
Flexibility without limits is instability.
Federation vs Modular Monolith
GraphQL Federation is supported in Go through tools like gqlgen and Apollo-compatible federation approaches.
However, federation introduces:
- Cross-service latency
- Schema coordination complexity
- Operational overhead
- Harder local development environments
For smaller teams or tightly coupled domains, a well-structured modular monolith in Go often provides:
- Better performance
- Lower cognitive load
- Faster iteration
Federation should be driven by domain boundaries and team scale, not architectural fashion.
Observability is Not Optional
GraphQL can hide expensive work behind nested fields.
We instrumented:
- Resolver latency
- Database query duration
- Cache hit ratios
- Error rates per field
- Query complexity distribution
Without field-level observability, performance regressions go unnoticed until customers feel them.
Scalability requires visibility.
Final Thoughts
GraphQL is neither inherently scalable nor inherently dangerous. It depends entirely on how it is designed.
In commerce systems, scalability is about:
- Clear architectural boundaries
- Batching and caching discipline
- Safe schema evolution
- Concurrency correctness
- Controlled query complexity
- Strong observability
The goal is not only high throughput.
The goal is predictable behavior as traffic grows and features evolve.
In backend engineering, scalability is less about speed and more about controlled complexity.
Victor Springer
5 min read·Jan 29, 2026