Overview
This article discusses the optimization of Linux memory management specifically for low-latency and high-throughput databases, focusing on LinkedIn's GraphDB. It highlights the impact of Linux's NUMA optimizations and zone reclaim on database performance, detailing experiments and findings that led to significant latency improvements.
What You'll Learn
1
How to disable zone reclaim mode in Linux for better database performance
2
Why managing garbage in the page cache is crucial for maintaining low latency
3
How to implement NUMA interleaving for applications to optimize memory usage
Prerequisites & Requirements
- Understanding of Linux memory management and NUMA architecture
- Experience with database performance tuning(optional)
Key Questions Answered
How does zone reclaim affect database performance on Linux?
Zone reclaim can significantly degrade database performance by triggering direct page scans that remove active pages from memory, leading to increased latency and higher rates of major faults. Disabling zone reclaim improved response times and reduced error rates in LinkedIn's GraphDB.
What are the symptoms of performance issues in GraphDB?
Performance issues in GraphDB manifest as spikes in response latency, high numbers of direct page scans, and low memory efficiency, even when there is no apparent memory pressure. These symptoms prompted an investigation into Linux's memory management.
What optimizations can be applied to Linux for better database performance?
To optimize Linux for database performance, disabling zone reclaim mode and enabling NUMA interleaving are recommended. These changes can lead to significant improvements in response latency and overall system efficiency.
What impact does Transparent HugePages have on NUMA systems?
Transparent HugePages can trigger direct page scans in NUMA systems, leading to performance degradation. Disabling this feature on RedHat systems helped mitigate the issue and improve database performance.
Key Statistics & Figures
Error rate reduction
Dropped to 1/4th the original
After implementing optimizations, LinkedIn's GraphDB saw a significant decrease in error rates, indicating improved performance.
Page scans per second
1 to 5 million
During performance degradation, the system recorded extremely high page scans, correlating with latency spikes.
Memory usage
48GB physical memory with 20GB consumed
A typical GraphDB host utilizes a significant portion of its memory for data management, impacting performance.
Technologies & Tools
Operating System
Linux
The article discusses optimizations and configurations for Linux to improve database performance.
Database
Graphdb
GraphDB is the focus of performance optimizations discussed in the article.
Key Actionable Insights
1Disabling zone reclaim mode can lead to dramatic improvements in database response times.LinkedIn observed a significant drop in error rates and latency after turning off zone reclaim mode, demonstrating the importance of optimizing Linux settings for database workloads.
2Managing the garbage in your page cache proactively can enhance performance.By implementing a segment pool to reuse data segments, GraphDB reduced the pressure on Linux's page cache, which previously led to performance issues due to excessive garbage collection.
3NUMA interleaving can optimize memory usage for applications running on multi-socket systems.Enabling NUMA interleaving allows applications to access memory more efficiently across multiple NUMA nodes, which is particularly beneficial for high-throughput databases.
Common Pitfalls
1
Assuming Linux's NUMA optimizations will always benefit database workloads can lead to performance issues.
Many database systems benefit more from caching data in memory than from keeping memory local to a specific NUMA node. This misconception can result in unnecessary complexity and degraded performance.
Related Concepts
Numa Architecture
Linux Memory Management
Database Performance Tuning
Garbage Collection In Databases