Blog

AWS MediaTailor SSAI in Production: The Hidden Complexity of Server-Side Ad Insertion

7 min read·Mar 10, 2026

AWS MediaTailor SSAI
 
Server-side ad insertion sounds cleaner than client-side ad insertion. In some ways, it is. But in production, it also moves monetization directly into the playback path.
 
From working on streaming platforms that used AWS MediaTailor, the biggest lesson was simple: a managed SSAI solution reduces implementation effort, but it does not remove system complexity.
 
It shifts complexity into manifests, cache behavior, tracking, latency budgets, failure handling, and observability. Those are the parts that determine whether an ad-supported streaming platform feels stable or fragile.
 
In this post, I will walk through the main engineering lessons I learned from using AWS MediaTailor for SSAI, focusing on what it improves, what it complicates, and what teams usually underestimate before going live.

 

 

Why MediaTailor Is Attractive

 
The appeal of AWS MediaTailor is obvious.
 
It centralizes ad insertion logic, reduces player-side complexity, and gives teams a more consistent ad experience across devices.
 
Instead of pushing ad orchestration into every client application, the stitching happens server-side, closer to the manifest layer.
 
That gives you a few immediate advantages:
  • Less ad-specific logic inside players
  • More consistent behavior across platforms
  • Less exposed to client-side ad blocking than CSAI
  • A more controlled monetization flow
 
That part is real. MediaTailor solves meaningful problems.
 
But it also creates a dangerous illusion: that ad insertion is now "handled" and no longer a core architectural concern.
 
That is the mistake.

 

 

SSAI Moves Ads Into the Critical Playback Path

 
The most important operational change with SSAI is this:
 
Ads are no longer adjacent to playback. They are part of playback.
 
Once manifests are being personalized and stitched with ad decisions, monetization latency starts affecting viewer experience directly.
 
That means:
  • Ad decision latency matters
  • Manifest stitching latency matters
  • Playback session setup reliability matters
  • Upstream dependency failures can degrade playback
 
On architecture diagrams, managed SSAI often looks neat. In production, it turns ad infrastructure into part of the serving path.
 
If you do not treat it with the same discipline as any other critical dependency, you create a fragile system very quickly.

 

A typical production question looks like this:

 

  • If ad session setup becomes slow, do you wait or degrade?
  • If monetization services are unhealthy, do you preserve revenue or playback continuity?
  • If the content path is healthy but the ad path is not, what fails first?
 
Those decisions define the system more than the product documentation does.

 

 

Personalized Manifests Change Cache Behavior

 
One of the easiest mistakes in SSAI systems is assuming that caching will behave the same way it did before ad stitching.
 
It will not.
 
With MediaTailor, manifests become more personalized and session-aware. That changes cache efficiency and makes CDN behavior less predictable if you designed the system with generic streaming assumptions.
 
The core issue is simple:
 
Personalization and cacheability are in constant tension.
 
Once two viewers can receive different ad experiences for the same content, manifest-level caching becomes more complicated.
 
That leads to consequences such as:
  • Lower cache hit ratios on manifest traffic
  • More origin pressure during spikes
  • Harder reasoning about edge behavior
  • More difficult debugging when playback differs across users
 
In non-SSAI pipelines, many cache assumptions are stable. With SSAI, some of those assumptions quietly stop being true.
 
That does not mean the model is bad. It means cache design must be revisited explicitly, not inherited blindly from a simpler delivery architecture.

 

The practical consequence:

 

A streaming platform that behaves well under shared manifests can behave very differently once manifests become individualized around ad sessions.

 

 

Tracking Is Harder Than It Looks

 
A common misunderstanding around SSAI is that server-side insertion automatically makes ad measurement clean.
 
It does not.
 
It makes some parts more controlled, but measurement is still full of edge cases.
 
The reason is straightforward:
 
An inserted ad is not the same thing as a confirmed ad view.
 
A manifest can contain an ad break. A session can be created correctly. Tracking beacons can be emitted. And the viewer can still abandon playback, hit buffering, switch devices, or fail to actually consume the ad as expected.
 
That is where teams start discovering that monetization truth and playback truth are not identical.
 
In practice, you need to reason separately about:
  • Ads inserted
  • Ads requested
  • Ads started
  • Ads actually watched
  • Tracking successfully emitted
  • Tracking successfully received downstream
 
If those layers are not clearly instrumented, the system becomes difficult to trust.
 
You may think you have a playback problem when you actually have a tracking problem. Or the opposite.

 

 

Failure Modes Matter More Than Happy Paths

 
SSAI demos focus on the happy path.
 
Production systems are defined by failure paths.
 
When using MediaTailor, the real engineering work starts when some part of the ad-serving chain is slow, unavailable, or inconsistent.
 
Typical questions appear immediately:
  • What happens if ad decisioning is slow?
  • What happens if session setup fails?
  • What happens if manifests are healthy but tracking is degraded?
  • What happens if content delivery is fine but monetization dependencies are not?
 
Those are not implementation details. They are product decisions expressed through architecture.
 
For example, if the ad path becomes unhealthy during a live event, do you block playback, delay playback, or choose a content-only fallback to preserve continuity?
 
There is no universally correct answer. But there must be an answer.
 
Teams that avoid these decisions early usually end up making them in the middle of an incident, which is the worst possible moment.

 

The real architectural decision:

 

  • Protect monetization first
  • Protect viewer continuity first
  • Use different fallback behavior for live and VOD
 
The specific answer matters less than the fact that it is explicit, tested, and observable.

 

 

Observability Is the Real Work

 
The operational difference between a manageable SSAI platform and a chaotic one is not the vendor.
 
It is observability.
 
Without clear visibility, teams end up debugging monetization, manifests, player behavior, and CDN issues through guesswork.
 
That does not scale.
 
At minimum, I would want visibility into:
  • Manifest stitching latency
  • SSAI session setup latency
  • Ad decision latency
  • Ad fill rate
  • Playback errors around ad breaks
  • Fallback activation rate
  • CDN hit ratio changes
  • Mismatch between inserted ads and confirmed views
 
That is the difference between "SSAI is enabled" and "SSAI is operable."
 
In ad-supported streaming, visibility is not a nice-to-have. It is what lets you separate playback incidents from monetization incidents and respond correctly to both.

 

Where debugging gets much harder:

 

One of the hardest parts of debugging SSAI playback is that server-side signals are not enough.
 
If player-side telemetry is limited, expensive to retain, or simply unavailable, teams end up trying to explain viewer behavior from manifests, backend timings, and CDN signals alone.
 
That creates a serious blind spot.
 
You may know that an ad break was stitched correctly and that delivery looked healthy from the server side, while still being unable to explain what the viewer actually experienced during playback.
 
That gap makes ambiguous playback issues much slower to diagnose and much harder to resolve with confidence.

 

 

Managed SSAI Does Not Remove Architectural Responsibility

 
The biggest mistake teams make with products like MediaTailor is assuming that outsourcing part of the implementation also outsources the architectural burden.
 
It does not.
 
Managed SSAI can reduce the amount of custom machinery you need to build. That is valuable.
 
But the hard questions remain yours:
  • How much latency can the playback path tolerate?
  • What degrades first when dependencies fail?
  • How do you preserve viewer continuity?
  • How do you validate monetization correctness?
  • How do you debug individualized manifest behavior under load?
 
AWS MediaTailor helps. It does not absolve.

 

 

Final Thoughts

 
From my experience, MediaTailor is useful for exactly the reason many teams adopt it: it reduces the amount of client-side ad complexity and gives you a more centralized SSAI model.
 
But that is only the beginning.
 
The real engineering challenge is not enabling ad insertion.
 
The real challenge is making ad-supported playback reliable, debuggable, and predictable under real production conditions.
 
That means treating SSAI as a distributed systems problem, not just a monetization feature.
 
Once ads become part of the delivery path, the standard changes.
 
You are no longer just serving video.
 
You are serving video, monetization, timing, and system resilience in the same request flow.
 
That is where the real complexity starts.
Read more →

Designing a Scalable GraphQL Backend in Go for a Commerce Platform

5 min read·Jan 29, 2026

Gopher + GraphQL Logo
 
GraphQL is extremely attractive for commerce platforms. It gives frontend teams flexibility, reduces over-fetching, and allows rapid feature iteration. But that flexibility comes with architectural responsibility.
 
Without discipline, a GraphQL backend can become the primary bottleneck in your system. Query flexibility can translate into unpredictable database load, cache pressure, and rising infrastructure costs.
 
In this post, I will walk through practical strategies for designing a scalable GraphQL backend in Go, focusing on batching, caching, schema evolution, and architectural boundaries.

 

 

Architecture Boundaries First

 
The most important design rule is simple:
 
GraphQL is a transport layer, not a business logic layer.
 
Resolvers should orchestrate domain services. They should not contain core business rules, persistence logic, or infrastructure concerns.
 
A clean Go structure typically looks like this:
  • graph: schema and resolvers
  • domain: business logic
  • repository: database access
  • infrastructure: cache, messaging, external integrations
 
This separation ensures:
  • Business logic is independently testable.
  • Transport concerns do not leak into core services.
  • The system can evolve without coupling everything to the GraphQL schema.
 
When GraphQL becomes the center of your architecture instead of an interface, scalability problems follow.

 

 

Eliminating the N+1 Problem

 
The most common scalability issue in GraphQL systems is the N+1 query problem.
 
Imagine querying 200 products and resolving inventory and pricing for each product individually. A naive resolver might trigger a separate database query per product.
 
Instead of executing:
 
SELECT * FROM inventory WHERE product_id = ?
 
200 times, we batch.
 
In Go, a request-scoped DataLoader pattern works well.

 

Example loader:

 

type InventoryLoader struct {
    repo *repository.InventoryRepo
}

func (l *InventoryLoader) LoadMany(ctx context.Context, productIDs []int) (map[int]Inventory, error) {
    return l.repo.GetByProductIDs(ctx, productIDs)
}

 

Repository implementation:

 

func (r *InventoryRepo) GetByProductIDs(ctx context.Context, ids []int) (map[int]Inventory, error) {
    query := `SELECT product_id, quantity FROM inventory WHERE product_id = ANY($1)`
    rows, err := r.db.QueryContext(ctx, query, pq.Array(ids))
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    result := make(map[int]Inventory)
    for rows.Next() {
        var inv Inventory
        if err := rows.Scan(&inv.ProductID, &inv.Quantity); err != nil {
            return nil, err
        }
        result[inv.ProductID] = inv
    }
    return result, nil
}
 
This reduces database round trips dramatically and stabilizes latency under load.
 
Batching is not optional in a high-traffic commerce backend. It is foundational.

 

 

Caching Strategy

 
In commerce systems, reads dominate writes.
 
A layered caching strategy typically includes:
  • Request-level in-memory caching
  • Redis for cross-request caching
  • CDN for public assets
 
The critical part is defining cache boundaries explicitly.
 
Cache keys must reflect business semantics. For example:
  • product:{id}
  • category:{id}:products
  • pricing:{region}:{product_id}
 
Invalidation must be designed before caching is implemented. Product updates must invalidate both product-level and category-level entries. Otherwise, you trade performance for consistency bugs.
 
Blind caching increases system fragility. Structured caching increases system predictability.

 

 

Schema Evolution Without Breaking Clients

 
GraphQL promises flexibility, but schema evolution is still a real operational concern.
 
In a fast-moving commerce environment, frontend and backend deployments are not always synchronized. Breaking changes in the schema can disrupt multiple teams.
 
Our approach was explicit versioning at the field level.
 
When introducing a new shape or behavior, instead of modifying an existing field, we introduced:
  • productPrice_v2
  • productPrice_v3
 
The older field was marked as deprecated:

 

type Product {
  productPrice: Money @deprecated(reason: "Use productPrice_v2")
  productPrice_v2: Money
}
 
We then observed usage of deprecated fields through schema analytics and resolver-level logging. Only after confirming that no active clients depended on the deprecated field did we remove it.
 
This approach provided:
  • Backward compatibility
  • Safe iteration
  • Clear migration paths
  • No surprise breakages
 
It does increase schema verbosity temporarily. But in a commerce platform where uptime and stability directly affect revenue, predictability outweighs aesthetic purity.

 

 

Concurrency and Context Discipline

 
Go makes concurrency simple. That does not mean it is safe by default.
 
Every resolver must respect context cancellation and timeouts.

 

Example:

 

func (r *Resolver) Product(ctx context.Context, id int) (*Product, error) {
    select {
    case <-ctx.Done():
        return nil, ctx.Err()
    default:
        return r.productService.GetByID(ctx, id)
    }
}

 

Downstream calls must propagate context properly:

 

func (r *ProductRepo) GetByID(ctx context.Context, id int) (*Product, error) {
    return r.db.QueryContext(ctx, "SELECT ... WHERE id = $1", id)
}

 

Ignoring context propagation leads to:
  • Wasted database queries
  • Goroutine leaks
  • Increased CPU under partial failures
 
Under load, these inefficiencies accumulate quickly.

 

 

Query Cost Control

 
GraphQL allows deeply nested queries. Without safeguards, a single request can become computationally expensive.
 
Defensive measures include:
  • Maximum query depth limits
  • Complexity scoring per field
  • Mandatory pagination for lists
  • Resolver-level timeouts
 
These constraints protect the database and cache layers from accidental or malicious misuse.
 
Flexibility without limits is instability.

 

 

Federation vs Modular Monolith

 
GraphQL Federation is supported in Go through tools like gqlgen and Apollo-compatible federation approaches.
 
However, federation introduces:
  • Cross-service latency
  • Schema coordination complexity
  • Operational overhead
  • Harder local development environments
 
For smaller teams or tightly coupled domains, a well-structured modular monolith in Go often provides:
  • Better performance
  • Lower cognitive load
  • Faster iteration
 
Federation should be driven by domain boundaries and team scale, not architectural fashion.

 

 

Observability is Not Optional

 
GraphQL can hide expensive work behind nested fields.
 
We instrumented:
  • Resolver latency
  • Database query duration
  • Cache hit ratios
  • Error rates per field
  • Query complexity distribution
 
Without field-level observability, performance regressions go unnoticed until customers feel them.
 
Scalability requires visibility.

 

 

Final Thoughts

 
GraphQL is neither inherently scalable nor inherently dangerous. It depends entirely on how it is designed.
 
In commerce systems, scalability is about:
  • Clear architectural boundaries
  • Batching and caching discipline
  • Safe schema evolution
  • Concurrency correctness
  • Controlled query complexity
  • Strong observability
 
The goal is not only high throughput.
 
The goal is predictable behavior as traffic grows and features evolve.
 
In backend engineering, scalability is less about speed and more about controlled complexity.
Read more →

Best Practices for Writing Scalable and Maintainable Code

8 min read·Mar 22, 2025

Clean Code Meme
 
Writing scalable and maintainable code isn't just about solving the immediate problem - it's about designing software that stands the test of time. We've all been there - coding a feature quickly and realizing later that the code becomes unmanageable. In this post, we'll explore how to avoid that fate by following best practices for scalable and maintainable code. Along the way, we'll also see how the SOLID principles can be applied to make your code more modular, flexible, and easier to maintain.

 

 

Modular Code

 
One of the core principles in software development is keeping your code modular and simple. By creating small, reusable functions and breaking down your code into manageable components, you make it easier to maintain and extend.

 

In Go, modularity can be achieved by dividing code into packages. This allows you to separate concerns and promote reuse. Let's look at an example from an e-commerce system, where we separate the logic for processing payments into its own package.

 

 

Example: Modular Code in Go (Processing Payments in an E-Commerce System)

 

// main.go package main import ( "fmt" "myapp/payment" ) func main() { transactionID := payment.ProcessPayment(100.50, "credit_card") fmt.Println("Transaction successful, ID:", transactionID) }

 

// payment/payment.go package payment import "fmt" // ProcessPayment handles payment logic and returns a transaction ID. func ProcessPayment(amount float64, method string) string { fmt.Printf("Processing %.2f payment using %s...\n", amount, method) return "TXN123456" // Mock transaction ID }

 

This example follows the Single Responsibility Principle (SRP) since the payment package only deals with payments. The Open/Closed Principle (OCP) is also supported: new payment methods (like PayPal) can be added by extending the payment package without modifying existing logic.

 

 

Design Patterns

 

Now that we've explored the importance of modularity in code and how it contributes to maintainability, it's time to turn our attention to design patterns. These patterns are reusable solutions to common problems that arise when building scalable and maintainable systems. By understanding and applying design patterns, you can structure your code more effectively, improve its flexibility, and keep it modular.

 

Let’s dive into some of the most widely used design patterns, such as Singleton, Factory, and Repository, and see how they can help your code follow SOLID principles while solving complex architectural problems.

 

 

Example: Singleton Pattern for Database Connection

 

real-world use case for the Singleton pattern is a database connection manager, ensuring that only one connection instance exists.

 

package database import ( "database/sql" "sync" _ "github.com/lib/pq" // PostgreSQL driver ) type Database struct { connection *sql.DB } var instance *Database var once sync.Once // GetInstance returns the single database connection instance. func GetInstance() *Database { once.Do(func() { db, _ := sql.Open("postgres", "user=admin dbname=mydb sslmode=disable") instance = &Database{connection: db} }) return instance }

 

This follows SRP because the database package handles only DB connections. It also enforces Dependency Inversion Principle (DIP) by ensuring that high-level modules depend on abstractions, not concrete database implementations.

 

 

Example: Factory Pattern for User Authentication

 

Factory Pattern can be used in user authentication, where multiple authentication methods (email, OAuth, etc.) are supported without modifying existing logic.

 

package auth type Authenticator interface { Authenticate(credentials string) bool } type EmailAuth struct{} func (e EmailAuth) Authenticate(credentials string) bool { return credentials == "valid-email" } type OAuthAuth struct{} func (o OAuthAuth) Authenticate(credentials string) bool { return credentials == "valid-oauth-token" } func AuthFactory(method string) Authenticator { switch method { case "email": return EmailAuth{} case "oauth": return OAuthAuth{} default: return nil } }

 

This supports OCP (new authentication types can be added without modifying existing logic) and Liskov Substitution Principle (LSP) (any Authenticator implementation can replace another without altering the system's behavior).

 

 

Example: Repository Pattern for User Data Management

 

A use case for the Repository Pattern is handling user data management in a microservices architecture. This example also follows the Interface Segregation Principle (ISP) by ensuring that different services only depend on the specific methods they need.

 

package repository import "errors" // User represents a user in the system. type User struct { ID int Name string Email string } // Segregated Interfaces following ISP type UserReader interface { GetUserByID(id int) (User, error) GetAllUsers() ([]User, error) } type UserWriter interface { CreateUser(user User) error DeleteUser(id int) error } type UserUpdater interface { UpdateUserEmail(id int, email string) error UpdateUserPassword(id int, password string) error } // Concrete implementation using an in-memory store type InMemoryUserRepo struct { users map[int]User } // NewInMemoryUserRepo initializes the repository. func NewInMemoryUserRepo() *InMemoryUserRepo { return &InMemoryUserRepo{users: make(map[int]User)} } // Implementing UserReader func (repo *InMemoryUserRepo) GetUserByID(id int) (User, error) { user, exists := repo.users[id] if !exists { return User{}, errors.New("user not found") } return user, nil } func (repo *InMemoryUserRepo) GetAllUsers() ([]User, error) { var userList []User for _, user := range repo.users { userList = append(userList, user) } return userList, nil } // Implementing UserWriter func (repo *InMemoryUserRepo) CreateUser(user User) error { if _, exists := repo.users[user.ID]; exists { return errors.New("user already exists") } repo.users[user.ID] = user return nil } func (repo *InMemoryUserRepo) DeleteUser(id int) error { if _, exists := repo.users[id]; !exists { return errors.New("user not found") } delete(repo.users, id) return nil } // Implementing UserUpdater func (repo *InMemoryUserRepo) UpdateUserEmail(id int, email string) error { user, exists := repo.users[id] if !exists { return errors.New("user not found") } user.Email = email repo.users[id] = user return nil } func (repo *InMemoryUserRepo) UpdateUserPassword(id int, password string) error { user, exists := repo.users[id] if !exists { return errors.New("user not found") } // For simplicity, storing plain text password (not recommended in real apps) user.Email = password repo.users[id] = user return nil }

 

This design adheres to ISP by ensuring that different services (e.g., authentication, user management, reporting) only depend on the necessary interfaces. It also promotes Separation of Concerns, making the code more maintainable and flexible.

 

 

Code Reviews

 

Regular code reviews are an essential practice for ensuring that code is both high-quality and maintainable. Not only do they help catch errors early, but they also provide opportunities for learning and improving. Code reviews promote knowledge sharing within a team and encourage adherence to best practices. I always appreciate feedback and enjoy helping others improve their code.

 

 

Naming Conventions

 

Clear and consistent naming conventions play a crucial role in making code more readable and understandable. Variables, functions, and class names should reflect their intended purpose, avoiding vague or overly generic terms. For instance, instead of naming a variable temp, opt for something descriptive like customerSessionData or articleList. This not only aids current developers but also benefits future developers who will work with the code.

 

In Go, the language encourages simplicity, and certain abbreviations are widely accepted - such as err for error and f for function. However, consistency is key. Once you establish a naming convention, it’s important to stick with it across the codebase. This ensures that other developers can easily understand the intent behind the code, without needing to delve too deeply into documentation.

 

By following consistent naming conventions, your code becomes more approachable and understandable, improving collaboration and speeding up future development.

 

 

Test-Driven Development (TDD)

 

Test-Driven Development (TDD) is a practice I strongly believe in. Writing tests before the code ensures that your solution meets the required functionality from the outset. TDD also provides immediate feedback on whether your code is working as expected. It helps you catch bugs early, which ultimately saves time and resources in the long run.

 

 

Refactoring

 

Even after code is written, it's important to keep refactoring to improve its readability and efficiency. As a project grows, it's easy for code to become messy or redundant. Periodically revisiting the code and simplifying it helps maintain its quality and reduces the likelihood of technical debt.

 

 

Documentation

 

Clear documentation is essential for any project, especially for large teams or long-term projects. Writing concise comments for complex sections of code or creating external documentation for the overall project ensures that everyone is on the same page. Documentation serves as a helpful guide for future developers, saving them time in understanding the codebase.

 

 

Conclusion

 

Writing maintainable and scalable code is not only important for the short-term success of a project but also for long-term growth. It improves collaboration, ensures fewer bugs, and makes it easier to add new features or modify existing ones without breaking the system. By following these practices, you'll not only improve the quality of your code but also set your project up for long-term success, keeping it clean, flexible, and adaptable to future changes.

 

 

Recommended Reading

 

  • "Clean Code: A Handbook of Agile Software Craftsmanship" – Robert C. Martin
    A must-read for writing readable, maintainable, and efficient code. Covers naming, functions, comments, and structuring code.

 

  • "Design Patterns: Elements of Reusable Object-Oriented Software" - Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides
    The bible of design patterns. Covers Factory, Singleton, Observer, Strategy, and more. Essential for software architecture.

 

  • "Agile Software Development, Principles, Patterns, and Practices" – Robert C. Martin
    This is where SOLID principles were first introduced in depth.

 

  • "The Pragmatic Programmer: Your Journey to Mastery" – Andy Hunt & Dave Thomas
    Covers a broad range of best practices, including modular design, automation, and debugging.

 

  • "Domain-Driven Design: Tackling Complexity in the Heart of Software" – Eric Evans
    Essential for designing modular, scalable, and maintainable applications with DDD principles.
Read more →