How Optimizing Memory Management with LMDB Boosted Performance on Our API Service

Pinterest Engineering
6 min readintermediate
--
View Original

Overview

This article discusses how implementing the Lightning Memory-Mapped Database (LMDB) improved memory management and performance for Pinterest's API service. By optimizing memory usage, the team was able to increase the number of processes per host, leading to better CPU utilization and a reduction in overall fleet size.

What You'll Learn

1

How to implement LMDB for efficient memory management in API services

2

Why reducing memory pressure is crucial for multi-process architectures

3

How to maintain low latency while managing configuration data

Prerequisites & Requirements

  • Understanding of multi-process architecture and memory management
  • Familiarity with LMDB and Python(optional)

Key Questions Answered

How did implementing LMDB affect memory usage in the API service?
Implementing LMDB reduced memory usage by 4.5%, which translated to an increase of 4.5 GB per host. This optimization allowed the team to increase the number of processes running on each host from 64 to 66, thereby enhancing the API service's capacity to handle requests.
What were the main challenges in transitioning to LMDB?
The main challenges included ensuring minimal impact on read latency and managing data updates effectively. The team needed to switch from per-process configuration data to a single copy per host while maintaining the existing interface for reading configuration data.
What alternatives to LMDB were considered for memory management?
The team evaluated three mmap-based solutions: Marisa Trie, Keyvi, and LMDB. They found that LMDB was the most suitable due to its ability to update files in a transaction without creating new versions, which was critical for maintaining asynchronous read connections.
How did the changes impact the overall performance of the API service?
The changes led to a significant improvement in performance, allowing the API service to handle a higher volume of requests per host without increasing system latency. The time required for LMDB read operations closely matched that of native Python lookups.

Key Statistics & Figures

Memory usage reduction
4.5%
This reduction allowed an increase of 4.5 GB per host.
Processes per host
Increased from 64 to 66
This increase enabled better CPU utilization and a higher number of requests handled.

Technologies & Tools

Database
Lightning Memory-mapped Database (lmdb)
Used for efficient memory management and configuration data storage.
Tools
Zookeeper
Used for distributing configuration updates to NGAPI hosts.

Key Actionable Insights

1
Consider using LMDB for applications that require efficient memory management and low latency.
LMDB's ability to reduce memory pressure while maintaining fast read operations makes it an ideal choice for high-performance API services.
2
Implement a sidecar pattern for managing configuration data updates in real-time.
Using a lightweight Python sidecar to monitor and update configuration data can help maintain performance and reduce memory usage across processes.
3
Prioritize the migration of the most memory-intensive configuration data first.
By focusing on the top 50 configuration-managed data structures, the team was able to achieve significant memory savings and improve overall system efficiency.

Common Pitfalls

1
Failing to account for memory usage when scaling processes can lead to inefficiencies.
Without proper memory management strategies, increasing the number of processes can cause memory pressure, leading to performance bottlenecks.

Related Concepts

Memory Management In Multi-process Architectures
Configuration Data Handling
Performance Optimization Techniques