Overview
This article discusses Uber's experience with garbage collection (GC) tuning to enhance the reliability of Presto, an open-source distributed SQL query engine. It details the challenges faced with memory fragmentation and out-of-memory errors, and how tuning the G1GC (Garbage First Garbage Collector) settings led to improved performance and reduced errors.
What You'll Learn
1
How to optimize garbage collection settings for Presto clusters
2
Why tuning G1GC parameters can reduce out-of-memory errors
3
When to apply dynamic Initiating Heap Occupancy Percent in JDK 11
Prerequisites & Requirements
- Understanding of garbage collection mechanisms in Java
- Familiarity with GC logging tools(optional)
Key Questions Answered
How does Uber tune garbage collection for Presto?
Uber tunes garbage collection for Presto by adjusting G1GC parameters such as Initiating Heap Occupancy Percent and Heap Waste Percent. This tuning helps manage memory usage effectively, reducing full garbage collections and out-of-memory errors, thereby improving the reliability of Presto clusters.
What are the benefits of tuning G1GC settings?
Tuning G1GC settings leads to better garbage collection pauses and minimizes out-of-memory errors. By optimizing these settings, Uber has seen a significant reduction in full GC occurrences, enhancing the overall performance and reliability of their Presto clusters.
What specific G1GC tuning flags did Uber implement?
Uber implemented several G1GC tuning flags including -XX:+UnlockExperimentalVMOptions, -XX:G1MaxNewSizePercent=20, -XX:G1ReservePercent=40, and -XX:G1HeapWastePercent=2. These flags were tailored to improve the performance of Presto clusters specifically.
When should the dynamic Initiating Heap Occupancy Percent be used?
The dynamic Initiating Heap Occupancy Percent should be used when running JDK 11, as it adjusts based on the current size of the young generation and a free threshold. This allows for more responsive garbage collection, adapting to the application's memory needs.
Key Statistics & Figures
Weekly active users running queries
12,000
This indicates the scale at which Presto is utilized at Uber.
Daily queries executed
500,000
This highlights the demand and load on Presto clusters.
Data read from HDFS
100 PB
This showcases the volume of data being processed by Presto.
Reduction in full GC occurrences
80%
This improvement was observed after tuning the GC settings.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Backend
Presto
Used for querying various data sources at Uber.
Backend
G1gc
Garbage collector used to manage memory in Java applications.
Backend
Openjdk 8
Initial Java version used at Uber before transitioning to JDK 11.
Backend
Jdk 11
Current Java version used with dynamic tuning capabilities.
Key Actionable Insights
1Regularly monitor GC logs to identify memory usage patterns and optimize settings accordingly.By analyzing GC logs, teams can understand the peak old-generation utilization and adjust parameters to prevent performance degradation.
2Consider reducing the maximum young generation size to improve concurrent marking performance.This adjustment can help avoid long GC pauses and ensure that concurrent marking runs more efficiently, particularly in memory-intensive applications.
3Implement dynamic tuning of the Initiating Heap Occupancy Percent in JDK 11 for better GC performance.Dynamic tuning allows for more flexible memory management, adapting to changing application loads and reducing the likelihood of out-of-memory errors.
Common Pitfalls
1
Failing to monitor GC logs can lead to unoptimized memory settings.
Without regular analysis, teams may miss critical insights into memory usage patterns, leading to performance issues.
2
Over-tuning GC parameters can result in increased CPU usage.
If parameters are set too aggressively, it may lead to excessive garbage collection cycles, negatively impacting application performance.
Related Concepts
Garbage Collection Mechanisms In Java
Performance Tuning For Distributed Systems
Memory Management Best Practices