HTTP/2 in infrastructure: Ambry network stack refactoring

LinkedIn Engineering Team
14 min readintermediate
--
View Original

Overview

The article discusses the refactoring of the Ambry network stack at LinkedIn to adopt HTTP/2, addressing network bottlenecks between frontend and storage nodes. It highlights the challenges faced with the previous TCP-based protocol and the benefits gained from implementing a Netty-based HTTP/2 architecture, including improved performance and scalability.

What You'll Learn

1

How to implement HTTP/2 in a network stack for improved performance

2

Why connection multiplexing is essential for scalability in distributed systems

3

How to optimize SSL performance in Java applications

Prerequisites & Requirements

  • Understanding of network protocols and distributed systems
  • Familiarity with Netty framework(optional)
  • Experience with Java programming and asynchronous programming concepts

Key Questions Answered

What were the main challenges faced with the previous Ambry network stack?
The previous Ambry network stack faced issues such as running out of file descriptors and connections, poor SSL performance, and high latency due to the lack of connection multiplexing. These bottlenecks hindered the system's scalability and performance, prompting the need for a refactor.
How did the implementation of HTTP/2 improve Ambry's performance?
The implementation of HTTP/2 allowed for connection multiplexing, which significantly reduced the number of connections needed between frontends and storage nodes. This change led to lower latency, reduced CPU and memory usage, and improved overall throughput in the Ambry system.
What specific improvements were observed after switching to HTTP/2?
After switching to HTTP/2, Ambry observed a reduction in memory usage from 6.6 GB to 4.9 GB, CPU utilization decreased from 7-10% to 4%, and the 95th percentile latency dropped from 100 ms to 38 ms, demonstrating significant performance enhancements.
What design goals were set for the new Ambry network stack?
The primary design goals for the new Ambry network stack included improved SSL performance, connection multiplexing for scalability, a high-performance single-client-multiple-servers implementation, and a robust network framework to reduce tuning time.

Key Statistics & Figures

Memory usage before and after HTTP/2 implementation
6.6 GB
Socket SSL
CPU utilization before and after HTTP/2 implementation
7-10%
Socket SSL
Router to server latency P95
100 ms
Socket SSL

Technologies & Tools

Backend
Netty
Used for implementing the HTTP/2 network stack in Ambry.
Protocol
HTTP/2
Adopted to enable connection multiplexing and improve performance.

Key Actionable Insights

1
Implementing connection multiplexing can significantly enhance the scalability of your network architecture.
By adopting HTTP/2, Ambry was able to reduce the number of connections required, which alleviated bottlenecks and improved performance. This approach can be applied to other distributed systems facing similar challenges.
2
Optimizing SSL performance is crucial for high-throughput applications.
The article highlights that using a high-performance SSL library like Netty's SSLEngine can reduce latency and resource usage, which is essential for applications that require secure connections.
3
Regularly review and adjust your TCP buffer sizes to match your application's data transfer needs.
Ambry's experience with large blob sizes led to the adjustment of TCP buffer sizes to 4 MB, which helped avoid data bottlenecks. This practice can be beneficial for any application dealing with large data transfers.

Common Pitfalls

1
Failing to optimize SSL performance can lead to significant latency issues.
Many applications overlook the efficiency of their SSL libraries, which can result in increased latency and resource consumption. It's important to evaluate and choose high-performance SSL implementations.
2
Neglecting to monitor connection usage can result in connection exhaustion.
Ambry faced connection exhaustion due to the limitations of its previous stack. Regular monitoring and adjustment of connection pools can prevent similar issues in other systems.

Related Concepts

Network Protocols
Connection Multiplexing
SSL Optimization
Distributed Systems Architecture