Overview
Netflix engineered a real-time recommendation delivery system for live events that can update over 100 million devices in under a minute. The system uses a two-phase approach—prefetching data ahead of time and broadcasting low-cardinality messages at critical moments—to solve the thundering herd problem while keeping millions of viewers in sync during events like the Jake Paul vs. Mike Tyson fight and NFL Christmas games.
What You'll Learn
How to solve the thundering herd problem when broadcasting real-time updates to hundreds of millions of devices
How to design a two-phase prefetch-and-broadcast system that eliminates traffic spikes during live events
Why adding jitter to cache TTLs prevents synchronized cache expiration storms
How to implement adaptive traffic prioritization using event-driven signals to manage burst traffic
How to architect a two-tier pub/sub system with WebSocket proxies for low-latency message fanout at massive scale
Prerequisites & Requirements
- Understanding of distributed systems concepts including caching, pub/sub messaging, and load balancing
- Familiarity with the thundering herd problem and its implications for high-traffic systems
- Basic understanding of GraphQL schemas and query interfaces(optional)
- Experience with event-driven architectures and message queuing systems like Apache Kafka(optional)
- Experience designing or operating systems that handle high-concurrency traffic patterns
Key Questions Answered
How does Netflix deliver real-time recommendations to over 100 million devices during live events?
What is the thundering herd problem and how did Netflix solve it for live streaming events?
Why do live events create different infrastructure challenges than video on demand at Netflix?
How does Netflix prevent cache stampede during live events?
What architecture does Netflix use for broadcasting messages to millions of devices in real time?
How does Netflix handle devices that miss a broadcast due to network issues during live events?
How does Netflix manage traffic prioritization during live event traffic spikes?
What three constraints did Netflix balance when designing real-time recommendation delivery?
Key Statistics & Figures
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1Split real-time update delivery into prefetch and broadcast phases to solve the thundering herd problem. Prefetch high-cardinality, compute-intensive data ahead of time by distributing requests naturally over a longer period, then trigger a low-cardinality broadcast at the critical moment to activate the pre-cached data on devices.This approach is applicable whenever you need to update millions of clients simultaneously at a specific point in time, such as flash sales, live sports, or coordinated feature launches.
2Add jitter to cache TTLs on both server and client sides to prevent synchronized cache expiration storms. Fixed TTLs cause all caches populated at similar times to expire simultaneously, creating unexpected traffic spikes that may occur hours before or after the actual peak event.This pattern is especially important when user traffic is concentrated around specific times rather than evenly distributed, as synchronized cache refreshes can create mini thundering herds even outside peak windows.
3Implement adaptive traffic prioritization with event-driven signals rather than relying solely on static traffic management rules. Route event-related traffic to dedicated clusters with more aggressive auto-scaling policies, and dynamically deprioritize non-critical server-driven updates during high RPS periods.Be aware that deprioritizing traffic to non-critical services can cause unexpected call patterns and traffic spikes elsewhere in your system, requiring careful monitoring of downstream effects.
4Design broadcast payloads to include both a state key and a timestamp to support at-least-once delivery semantics. The state key allows devices to look up pre-cached data locally without additional server requests, while the timestamp enables devices to catch up on missed broadcasts upon reconnection.This is critical for unreliable network environments where devices may temporarily lose connectivity during live events, ensuring no viewer misses an update.
5Use synthetic traffic generation to simulate game-day scenarios and identify potential traffic hotspots before live events. High-watermark traffic projections revealed issues like cache synchrony and unexpected traffic patterns that weren't visible in normal VOD traffic patterns.Load testing with realistic burst patterns is essential because live event traffic behaves fundamentally differently from on-demand traffic, and issues may surface hours before or after the actual event window.
6Keep business logic off client devices by using a map of stage keys in API responses that devices can look up locally. This allows the server to control the experience through broadcast state changes while devices simply render the appropriate pre-fetched content for each stage.This pattern simplifies client-side logic and enables rapid iteration on the server side without requiring device updates, which is critical when supporting hundreds of millions of diverse devices.