Applying Quality of Service techniques at the application level
Overview
The article discusses Netflix's implementation of service-level prioritized load shedding to enhance system reliability and user experience during high traffic conditions. It details the evolution of load shedding strategies from the API gateway to individual services, emphasizing the importance of prioritizing user-initiated requests over pre-fetch requests.
What You'll Learn
1
How to implement prioritized load shedding in microservices
2
Why prioritizing user-initiated requests improves user experience
3
When to apply concurrency limits to manage system load
Prerequisites & Requirements
- Understanding of microservices architecture
- Familiarity with Java and the Netflix/concurrency-limits library
Key Questions Answered
How does Netflix prioritize requests during high traffic?
Netflix prioritizes user-initiated requests over pre-fetch requests by implementing a concurrency limiter that creates separate partitions for each request type. This ensures that critical playback requests maintain high availability even during traffic spikes, while non-critical requests are shed as needed.
What are the benefits of service-level prioritized load shedding?
Service-level prioritized load shedding allows service teams to manage their own prioritization logic, leading to finer control over request handling. It improves resource utilization and ensures that critical requests are served promptly, enhancing overall system resilience and user experience.
What issues arise from traditional load shedding methods?
Traditional load shedding methods often treat all requests equally, which can lead to reduced availability for critical user-initiated requests during traffic spikes. This approach can result in poor user experience and inefficient resource usage, as both critical and non-critical requests are throttled equally.
How did Netflix test the effectiveness of prioritized load shedding?
Netflix conducted Failure Injection Testing by injecting 2 seconds of latency into pre-fetch calls and comparing the performance of a baseline instance with regular load shedding against a canary instance using prioritized load shedding. This helped validate the effectiveness of their new approach.
Key Statistics & Figures
User-initiated request availability during traffic spikes
> 99.4%
This availability was maintained even when pre-fetch request availability dropped significantly due to prioritized load shedding.
Drop in pre-fetch request availability during a spike
as low as 20%
This occurred during a significant increase in pre-fetch requests after an infrastructure outage.
Latency for pre-fetch calls
< 200 ms
This was the typical p99 latency before the introduction of prioritized load shedding.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Backend
Netflix/Concurrency-limits
Used to implement the concurrency limiter that prioritizes user-initiated requests over pre-fetch requests.
Programming Language
Java
The implementation of the load shedding mechanism and the concurrency limits library is based on Java.
Key Actionable Insights
1Implement service-level prioritized load shedding to enhance user experience during peak traffic.By allowing critical requests to maintain priority, services can ensure that user-initiated actions are not adversely affected, leading to a smoother experience even under stress.
2Utilize the Netflix/concurrency-limits library for managing request prioritization.This library provides a robust framework for implementing concurrency limits and can help in efficiently managing resources across microservices.
3Conduct regular testing of load-shedding mechanisms to ensure they perform as expected under various conditions.Testing helps identify potential bottlenecks and ensures that the system can handle unexpected traffic spikes without degrading user experience.
Common Pitfalls
1
Failing to implement request prioritization can lead to degraded user experience.
When all requests are treated equally, critical requests may be delayed or dropped during high traffic, leading to frustration for users and potential loss of engagement.
2
Overly aggressive load shedding can result in congestive failure.
If too many requests are shed, the system may not be able to sustain a reasonable throughput, leading to increased latency and further failures.
Related Concepts
Microservices Architecture
Concurrency Management
Load Balancing Techniques