Serving 100 Gbps from an Open Connect Appliance

By Drew Gallatin

Netflix Technology Blog
18 min readintermediate
--
View Original

Overview

The article discusses Netflix's ambitious project to serve 100 Gbps from a single FreeBSD-based Open Connect Appliance (OCA) using NVM Express (NVMe) storage. It details the challenges faced, such as CPU bottlenecks and lock contention, and the innovative solutions implemented to optimize performance.

What You'll Learn

1

How to identify and resolve CPU bottlenecks in high-throughput systems

2

Why proactive VM page scanning can improve server performance under load

3

How to optimize memory bandwidth usage in data-intensive applications

4

When to implement RSS Assisted LRO for better packet aggregation

Prerequisites & Requirements

  • Understanding of FreeBSD and network stack concepts
  • Familiarity with performance profiling tools like VTune and DTrace(optional)

Key Questions Answered

What were the main bottlenecks in serving 100 Gbps from an OCA?
The main bottlenecks included CPU limitations, lock contention on FreeBSD's inactive page queue, and constraints from the pbuf mutex. These issues were addressed through innovative solutions like Fake NUMA and optimizing the vnode pager.
How did Netflix achieve over 90 Gbps performance for TLS traffic?
Netflix achieved over 90 Gbps for TLS traffic by optimizing memory bandwidth usage, reducing lock contention, and implementing a new mbuf structure that allowed multiple pages to be carried in a single mbuf, significantly reducing overhead.
What is the significance of RSS Assisted LRO in network performance?
RSS Assisted LRO improves packet aggregation by sorting packets based on their RSS hash results, allowing packets from the same TCP connection to be aggregated more effectively, which reduces system load and increases throughput.
What optimizations were made to the ISA-L encryption library?
The ISA-L encryption library was optimized to use non-temporal instructions for storing encryption results, which increased memory bandwidth efficiency and improved throughput from 58 Gbps to 65 Gbps.

Key Statistics & Figures

Initial serving capacity
40 Gbps
This was the performance limit before optimizations were implemented.
Performance after Fake NUMA implementation
52 Gbps
This was achieved with reduced lock contention.
Final performance for TLS traffic
90 Gbps
This was the throughput achieved after all optimizations were applied.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Operating System
Freebsd
Used as the base for the Open Connect Appliance.
Storage Technology
Nvm Express (nvme)
Utilized for high-speed storage in the Open Connect Appliance.
Web Server
Nginx
Serves large media files using the sendfile() system call.

Key Actionable Insights

1
Implement proactive VM page scanning to enhance server responsiveness under load.
This technique allows NGINX processes to continue serving traffic while the pageout daemon scans memory, preventing significant latency spikes during high memory usage.
2
Utilize RSS Assisted LRO to improve packet processing efficiency on high-traffic servers.
By sorting packets before passing them to the LRO engine, you can achieve better aggregation rates, which is crucial for maintaining high throughput in environments with many active TCP connections.
3
Optimize memory bandwidth by reducing read-modify-write operations in encryption processes.
Switching to non-temporal instructions can significantly improve performance, especially in data-intensive applications where memory bandwidth is a limiting factor.

Common Pitfalls

1
Failing to account for lock contention when optimizing server performance.
Lock contention can severely limit throughput, especially in high-performance environments. It's crucial to analyze and mitigate lock contention to achieve optimal performance.
2
Overlooking memory bandwidth limitations in data-intensive applications.
Many optimizations can lead to increased memory usage, which may inadvertently limit performance. Profiling tools should be used to identify and address memory bandwidth issues.

Related Concepts

Performance Optimization Techniques
Memory Management In Freebsd
Networking Concepts Related To TCP And UDP