How Facebook is bringing QUIC to billions

We are replacing the de facto protocol the internet has used for decades with QUIC, the latest and most radical step we’ve taken to optimize our network protocols to create a better experience for …

Matt Joras
10 min readintermediate
--
View Original

Overview

The article discusses Facebook's implementation of QUIC, a modern transport protocol designed to enhance network performance and user experience. With over 75% of Facebook's internet traffic utilizing QUIC and HTTP/3, the article highlights significant improvements in metrics such as request errors and latency, as well as the challenges faced during deployment.

What You'll Learn

1

How to implement QUIC in a large-scale application

2

Why QUIC improves performance over TCP and HTTP/2

3

When to utilize QUIC for dynamic versus static content

4

How to optimize network load balancers for QUIC traffic

Prerequisites & Requirements

  • Understanding of network protocols like TCP and HTTP
  • Experience with application performance optimization(optional)

Key Questions Answered

What improvements does QUIC offer over TCP and HTTP/2?
QUIC outperforms TCP and HTTP/2 by reducing request errors by 6%, tail latency by 20%, and response header size by 5%. These enhancements lead to a significantly improved user experience, especially in poor network conditions.
How did Facebook deploy QUIC in their applications?
Facebook first tested QUIC on internal network traffic before rolling it out to the Facebook app. They enabled QUIC for dynamic GraphQL requests, which led to measurable performance improvements and informed further deployment strategies.
What challenges did Facebook face when transitioning to QUIC?
Facebook encountered issues with existing app heuristics that were tuned for TCP, leading to increased error rates for static content. Adjustments were needed to optimize request strategies and flow control parameters for QUIC's unique characteristics.
What performance metrics improved with QUIC in the Facebook app?
With QUIC, Facebook experienced a 6% reduction in request errors, a 20% decrease in tail latency, and a 5% reduction in response header size. These metrics contributed to a better overall user experience.

Key Statistics & Figures

Reduction in request errors
6%
Observed in the Facebook app after enabling QUIC for dynamic requests.
Reduction in tail latency
20%
Measured improvement in user experience with QUIC.
Reduction in response header size
5%
Noted after QUIC deployment for dynamic content.
Mean time between rebuffering (MTBR) improvement
up to 22%
Observed in video playback metrics after QUIC was implemented.
Reduction in video request error count
8%
Improvement seen in the Facebook app with QUIC for video content.

Technologies & Tools

Network Protocol
Quic
Used to replace TCP for improved performance in Facebook's applications.
Network Protocol
HTTP/3
The next iteration of HTTP that works in conjunction with QUIC.
Backend
Mvfst
Facebook's own implementation of QUIC for testing and deployment.
Backend
Proxygen
HTTP client/server library used for communication with Facebook's servers.
Congestion Control
Bbr
Primary congestion control implementation used in conjunction with QUIC.

Key Actionable Insights

1
Integrate QUIC into your application to enhance performance metrics significantly.
By leveraging QUIC's capabilities, applications can achieve lower latency and reduced error rates, particularly beneficial for dynamic content delivery.
2
Conduct thorough testing on internal traffic before wide-scale QUIC deployment.
Testing QUIC on internal networks allows for identifying potential issues early, ensuring a smoother transition when rolling out to external users.
3
Adjust application heuristics to align with QUIC's behavior to avoid performance regressions.
Existing heuristics tuned for TCP may not perform well with QUIC, necessitating iterative adjustments to optimize request handling and flow control.

Common Pitfalls

1
Failing to adjust application heuristics when transitioning to QUIC can lead to increased error rates.
This occurs because heuristics optimized for TCP may not be suitable for QUIC, resulting in suboptimal request handling and performance issues.
2
Overestimating available bandwidth can cause video playback issues when using QUIC.
If bandwidth estimators are not tuned for QUIC, they may lead to requests for higher-quality content than the network can support, resulting in stalls.

Related Concepts

Network Protocols
Performance Optimization Techniques
Congestion Control Mechanisms
Transport Layer Security