Seamlessly Swapping the API backend of the Netflix Android app

How we migrated our Android endpoints out of a monolith into a new microservice

Overview

The article discusses the migration of the Netflix Android app's API backend from a monolithic service to a microservice architecture, detailing the strategies, tools, and challenges encountered during this year-long project. It highlights the adoption of the Backend for Frontend (BFF) pattern and the use of the Falcor data model to improve data fetching and user experience.

What You'll Learn

1

How to implement a microservice architecture for an API backend

2

Why using the Backend for Frontend (BFF) pattern enhances API design

3

How to conduct migration testing using functional and replay testing

4

When to use canary deployments for new service features

5

How to improve observability in microservices using distributed tracing

Prerequisites & Requirements

  • Understanding of microservices and API design principles
  • Familiarity with Node.js and JavaScript(optional)
  • Experience with backend development and testing methodologies

Key Questions Answered

What is the Backend for Frontend (BFF) pattern and how is it used?
The Backend for Frontend (BFF) pattern involves creating a dedicated backend for each client type, such as Android, iOS, or web. This approach allows teams to tailor APIs specifically for their client needs, improving performance and user experience by reducing unnecessary data fetching and processing.
How did Netflix migrate its API backend without affecting user experience?
Netflix migrated its API backend by gradually decoupling the existing monolithic service into a microservice architecture. They employed careful planning, testing, and monitoring strategies, including canary deployments and performance metrics tracking, to ensure that users experienced no disruptions during the transition.
What testing strategies were implemented during the migration?
The migration involved functional testing to ensure data accuracy, replay testing to compare responses from old and new services, and canary deployments to monitor performance and user experience before full rollout. This comprehensive testing approach helped identify and mitigate potential issues early.
What challenges did Netflix face after migrating to a microservice architecture?
Post-migration, Netflix encountered increased latencies due to the need for network calls to fetch cached data that was previously accessible within the monolith. They also faced issues with partial query errors, which required improved error handling and retry logic to enhance resilience.

Key Statistics & Figures

Number of query paths migrated
170
The initial number of query paths that needed to be transitioned to the new microservice architecture.
Percentage of regression identified during canary testing
4-5%
This percentage reflects the latency regression observed on high-traffic UI screens during the canary deployment phase.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement a phased migration strategy for transitioning to microservices to minimize disruption.
By gradually migrating components and ensuring thorough testing at each stage, teams can avoid significant user experience issues and maintain operational stability.
2
Utilize canary deployments to test new features with a small subset of users before full rollout.
This approach allows for real-world testing of new services, helping to identify performance regressions and user experience issues early in the deployment process.
3
Incorporate distributed tracing to enhance observability across microservices.
Using tools like Zipkin can provide insights into request flows and help diagnose performance bottlenecks, ultimately improving the reliability of the service.
4
Prioritize testing during migration to ensure data integrity and performance.
Implementing functional and replay testing can help confirm that new services return the same data as the old ones, preventing regressions and ensuring a smooth transition.

Common Pitfalls

1
Underestimating the impact of network latency when transitioning from a monolithic to a microservice architecture.
This can lead to increased response times for data that was previously cached locally, requiring careful optimization and monitoring to mitigate performance issues.
2
Neglecting to implement robust error handling for partial query responses.
As microservices introduce network boundaries, the likelihood of partial data responses increases, necessitating improved error handling strategies to prevent application crashes.

Related Concepts

Microservices Architecture
Backend For Frontend (bff) Pattern
Distributed Tracing
API Design Principles