Optimizing Linux Memory Management for Low-latency / High-throughput Databases

LinkedIn Engineering Team

•

LinkedIn Engineering Team

•15 min read•intermediate•

--

•View Original

JavaMySQLPostgreSQL

Overview

This article discusses the optimization of Linux memory management specifically for low-latency and high-throughput databases, focusing on LinkedIn's GraphDB. It highlights the impact of Linux's NUMA optimizations and zone reclaim on database performance, detailing experiments and findings that led to significant latency improvements.

What You'll Learn

1

How to disable zone reclaim mode in Linux for better database performance

2

Why managing garbage in the page cache is crucial for maintaining low latency

3

How to implement NUMA interleaving for applications to optimize memory usage

Prerequisites & Requirements

Understanding of Linux memory management and NUMA architecture
Experience with database performance tuning(optional)

Key Questions Answered

How does zone reclaim affect database performance on Linux?

Zone reclaim can significantly degrade database performance by triggering direct page scans that remove active pages from memory, leading to increased latency and higher rates of major faults. Disabling zone reclaim improved response times and reduced error rates in LinkedIn's GraphDB.

What are the symptoms of performance issues in GraphDB?

Performance issues in GraphDB manifest as spikes in response latency, high numbers of direct page scans, and low memory efficiency, even when there is no apparent memory pressure. These symptoms prompted an investigation into Linux's memory management.

What optimizations can be applied to Linux for better database performance?

To optimize Linux for database performance, disabling zone reclaim mode and enabling NUMA interleaving are recommended. These changes can lead to significant improvements in response latency and overall system efficiency.

What impact does Transparent HugePages have on NUMA systems?

Transparent HugePages can trigger direct page scans in NUMA systems, leading to performance degradation. Disabling this feature on RedHat systems helped mitigate the issue and improve database performance.

Key Statistics & Figures

Error rate reduction

Dropped to 1/4th the original

After implementing optimizations, LinkedIn's GraphDB saw a significant decrease in error rates, indicating improved performance.

Page scans per second

1 to 5 million

During performance degradation, the system recorded extremely high page scans, correlating with latency spikes.

Memory usage

48GB physical memory with 20GB consumed

A typical GraphDB host utilizes a significant portion of its memory for data management, impacting performance.

Technologies & Tools

Operating System

Linux

The article discusses optimizations and configurations for Linux to improve database performance.

Database

Graphdb

GraphDB is the focus of performance optimizations discussed in the article.

Key Actionable Insights

1
Disabling zone reclaim mode can lead to dramatic improvements in database response times.
LinkedIn observed a significant drop in error rates and latency after turning off zone reclaim mode, demonstrating the importance of optimizing Linux settings for database workloads.

2
Managing the garbage in your page cache proactively can enhance performance.
By implementing a segment pool to reuse data segments, GraphDB reduced the pressure on Linux's page cache, which previously led to performance issues due to excessive garbage collection.

3
NUMA interleaving can optimize memory usage for applications running on multi-socket systems.
Enabling NUMA interleaving allows applications to access memory more efficiently across multiple NUMA nodes, which is particularly beneficial for high-throughput databases.

Common Pitfalls

1

Assuming Linux's NUMA optimizations will always benefit database workloads can lead to performance issues.

Many database systems benefit more from caching data in memory than from keeping memory local to a specific NUMA node. This misconception can result in unnecessary complexity and degraded performance.

Related Concepts

Numa Architecture

Linux Memory Management

Database Performance Tuning

Garbage Collection In Databases