Behind the Streams: Real-Time Recommendations for Live Events Part 3

Netflix Technology Blog
9 min readadvanced
--
View Original

Overview

Netflix engineered a real-time recommendation delivery system for live events that can update over 100 million devices in under a minute. The system uses a two-phase approach—prefetching data ahead of time and broadcasting low-cardinality messages at critical moments—to solve the thundering herd problem while keeping millions of viewers in sync during events like the Jake Paul vs. Mike Tyson fight and NFL Christmas games.

What You'll Learn

1

How to solve the thundering herd problem when broadcasting real-time updates to hundreds of millions of devices

2

How to design a two-phase prefetch-and-broadcast system that eliminates traffic spikes during live events

3

Why adding jitter to cache TTLs prevents synchronized cache expiration storms

4

How to implement adaptive traffic prioritization using event-driven signals to manage burst traffic

5

How to architect a two-tier pub/sub system with WebSocket proxies for low-latency message fanout at massive scale

Prerequisites & Requirements

  • Understanding of distributed systems concepts including caching, pub/sub messaging, and load balancing
  • Familiarity with the thundering herd problem and its implications for high-traffic systems
  • Basic understanding of GraphQL schemas and query interfaces(optional)
  • Experience with event-driven architectures and message queuing systems like Apache Kafka(optional)
  • Experience designing or operating systems that handle high-concurrency traffic patterns

Key Questions Answered

How does Netflix deliver real-time recommendations to over 100 million devices during live events?
Netflix uses a two-phase approach: first, it prefetches materialized recommendations, title metadata, and artwork onto devices as members browse before the event, distributing load naturally over time. Then, at critical moments, it broadcasts a low-cardinality message containing a state key and timestamp to all connected devices via WebSocket, prompting them to use pre-cached data locally without additional server requests.
What is the thundering herd problem and how did Netflix solve it for live streaming events?
The thundering herd problem occurs when millions of devices simultaneously request updates, overwhelming cloud services. Netflix solved it by splitting updates into prefetching (spreading data requests naturally over time before events) and real-time broadcasting (sending small, low-cardinality messages at event moments). This eliminates traffic spikes since devices already have the data cached and only need a lightweight trigger to update.
Why do live events create different infrastructure challenges than video on demand at Netflix?
Unlike VOD where members choose their own viewing time creating smooth traffic patterns, live events cause millions of viewers to converge simultaneously. This creates a narrow window where recommendations must be timely—too early and excitement fades, too late and the moment is missed. The concentrated traffic patterns also cause synchronized cache expirations and unexpected traffic spikes hours before and after events.
How does Netflix prevent cache stampede during live events?
Netflix discovered that fixed TTLs caused cache expirations to synchronize when many members joined at the same time for live events, creating traffic spikes hours before and after events. They solved this by adding jitter to both server-side and client-side cache expirations, which spreads out cache refresh requests over time and smooths out traffic spikes that weren't a problem for VOD content.
What architecture does Netflix use for broadcasting messages to millions of devices in real time?
Netflix uses a two-tier pub/sub architecture built on Pushy (their WebSocket proxy), Apache Kafka, and Netflix's KV key-value store. A Message Producer microservice centralizes business logic, monitors events, and schedules broadcasts. It hands messages to a Message Router that tracks subscriptions at Pushy node granularity, while Pushy nodes map subscriptions to individual connections, creating low-latency fanout.
How does Netflix handle devices that miss a broadcast due to network issues during live events?
Each broadcast payload includes a state key indicating the current event stage and a timestamp. If a device misses a broadcast due to network instability, it can catch up by replaying missed updates upon reconnecting using the timestamp to identify what was missed. Netflix uses 'at least once' broadcast delivery semantics to maximize device reach and ensure every device gets the latest updates.
How does Netflix manage traffic prioritization during live event traffic spikes?
Netflix improved traffic sharding strategies using event-based signals to route live event traffic to dedicated clusters with aggressive scaling policies. They added dynamic traffic prioritization rulesets that activate during high RPS periods, aggressively deprioritizing non-critical server-driven updates so systems can devote resources to the most time-sensitive computations during peak demand.
What three constraints did Netflix balance when designing real-time recommendation delivery?
Netflix identified three key constraints: Time (the duration required to coordinate an update), Request Throughput (the capacity of cloud services to handle requests), and Compute Cardinality (the variety of requests necessary to serve a unique update). The prefetch phase optimizes throughput and cardinality, while the broadcast phase optimizes throughput and time, solving the constraint optimization problem.

Key Statistics & Figures

Devices updated during peak load
Over 100 million devices in under a minute
During peak load at multiple stages of live events
Jake Paul vs. Mike Tyson viewership
60 million households
Households that tuned in live for the fight
NFL Christmas Gameday viewership
65 million US viewers
Netflix NFL Christmas Gameday US viewers
Crawford vs. Canelo viewership
Over 41 million global viewers
Global viewers who watched Terence Crawford defeat on Netflix

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

API
Graphql
Domain Graph Service (DGS) providing query interfaces for prefetching and broadcast schemas for device communication
Messaging
Apache Kafka
Part of the two-tier pub/sub architecture for message routing and delivery
Protocol
Websocket
Real-time bidirectional communication between servers and devices via Pushy proxy for broadcast delivery
Infrastructure
Pushy
Netflix's WebSocket proxy that maps subscriptions to individual device connections for low-latency fanout
Database
Netflix Kv Store
Key-value data abstraction layer used as part of the pub/sub architecture for subscription and message management

Key Actionable Insights

1
Split real-time update delivery into prefetch and broadcast phases to solve the thundering herd problem. Prefetch high-cardinality, compute-intensive data ahead of time by distributing requests naturally over a longer period, then trigger a low-cardinality broadcast at the critical moment to activate the pre-cached data on devices.
This approach is applicable whenever you need to update millions of clients simultaneously at a specific point in time, such as flash sales, live sports, or coordinated feature launches.
2
Add jitter to cache TTLs on both server and client sides to prevent synchronized cache expiration storms. Fixed TTLs cause all caches populated at similar times to expire simultaneously, creating unexpected traffic spikes that may occur hours before or after the actual peak event.
This pattern is especially important when user traffic is concentrated around specific times rather than evenly distributed, as synchronized cache refreshes can create mini thundering herds even outside peak windows.
3
Implement adaptive traffic prioritization with event-driven signals rather than relying solely on static traffic management rules. Route event-related traffic to dedicated clusters with more aggressive auto-scaling policies, and dynamically deprioritize non-critical server-driven updates during high RPS periods.
Be aware that deprioritizing traffic to non-critical services can cause unexpected call patterns and traffic spikes elsewhere in your system, requiring careful monitoring of downstream effects.
4
Design broadcast payloads to include both a state key and a timestamp to support at-least-once delivery semantics. The state key allows devices to look up pre-cached data locally without additional server requests, while the timestamp enables devices to catch up on missed broadcasts upon reconnection.
This is critical for unreliable network environments where devices may temporarily lose connectivity during live events, ensuring no viewer misses an update.
5
Use synthetic traffic generation to simulate game-day scenarios and identify potential traffic hotspots before live events. High-watermark traffic projections revealed issues like cache synchrony and unexpected traffic patterns that weren't visible in normal VOD traffic patterns.
Load testing with realistic burst patterns is essential because live event traffic behaves fundamentally differently from on-demand traffic, and issues may surface hours before or after the actual event window.
6
Keep business logic off client devices by using a map of stage keys in API responses that devices can look up locally. This allows the server to control the experience through broadcast state changes while devices simply render the appropriate pre-fetched content for each stage.
This pattern simplifies client-side logic and enables rapid iteration on the server side without requiring device updates, which is critical when supporting hundreds of millions of diverse devices.

Common Pitfalls

1
Using fixed cache TTLs when traffic patterns are concentrated around specific times. For live events, members joining simultaneously causes caches to be populated at the same time, leading to synchronized mass expirations and recomputations that create traffic spikes hours before and after events—well outside the anticipated peak window.
This issue doesn't surface with VOD traffic because viewing patterns are naturally distributed. Add jitter to both server and client cache TTLs to spread out refresh requests over time.
2
Attempting to scale linearly to handle live event traffic spikes. Simply adding more resources isn't efficient or reliable for handling concentrated bursts from millions of simultaneous viewers, and it could divert resources from other critical services during peak demand.
Instead of scaling up, architect a smarter solution that reduces the work needed at peak moments, such as prefetching data ahead of time and broadcasting lightweight triggers.
3
Deprioritizing non-critical traffic during live events without monitoring downstream effects. Netflix found that aggressively deprioritizing traffic to non-critical services caused unexpected call patterns and traffic spikes in other parts of the system, creating secondary problems.
Always monitor the cascading effects of traffic prioritization changes across the entire service mesh, not just the primary services handling event traffic.
4
Excessive non-essential logging during high-traffic events. Netflix discovered that the high traffic volume from popular live events caused excessive non-essential logging that put unnecessary pressure on their log ingestion pipelines, potentially impacting observability when it was needed most.
Implement dynamic log level adjustment or log sampling strategies that activate during peak traffic periods to reduce logging pressure while maintaining critical observability.

Related Concepts

Thundering Herd Problem
Pub/Sub Messaging Architecture
Cache Invalidation And Ttl Jitter
Websocket Real-time Communication
Traffic Sharding And Prioritization
At-least-once Delivery Semantics
Constraint Optimization
Event-driven Architecture
Graphql Domain Graph Service
Synthetic Load Testing
Device-side Caching And Prefetching
Low-latency Message Fanout