How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning @Uber)

Cristian Velazquez
9 min readintermediate
--
View Original

Overview

This article discusses how Uber's engineering team optimized garbage collection (GC) in their Go-based services, resulting in a significant reduction of compute costs amounting to 70,000 cores saved across 30 mission-critical services. The implementation of a semi-automated GOGC tuning mechanism is detailed, showcasing its impact on efficiency and performance.

What You'll Learn

1

How to optimize garbage collection in Go services

2

Why dynamic GOGC tuning is essential for diverse microservices

3

When to implement automated GC tuning mechanisms

Prerequisites & Requirements

  • Understanding of Go programming and garbage collection concepts
  • Familiarity with cloud-native infrastructure and microservices architecture(optional)

Key Questions Answered

How did Uber save 70,000 cores across its services?
Uber saved 70,000 cores by optimizing garbage collection in their Go services through a semi-automated GOGC tuning mechanism. This approach allowed for significant reductions in CPU utilization, leading to cost savings across multiple mission-critical services.
What is the role of GOGC in Go's garbage collection?
GOGC is a parameter in Go's garbage collection that determines the amount of memory allocated for new objects relative to the live dataset. Adjusting GOGC can significantly impact the frequency and efficiency of garbage collection, thus affecting overall application performance.
What challenges does fixed GOGC tuning present?
Fixed GOGC tuning can lead to out-of-memory issues as it does not account for the maximum memory assigned to containers. Additionally, diverse memory utilization across microservices can result in inefficient GC performance, necessitating a more dynamic tuning approach.

Key Statistics & Figures

Cores saved
70,000
This figure represents the total compute capacity saved across 30 mission-critical services after implementing GOGC tuning.
Reduction in p99 CPU utilization
65%
Observed in the observability service that operates on thousands of compute cores.
Reduction in p99 CPU utilization for Uber Eats service
30%
This reduction was achieved after deploying GOGCTuner across the service.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Programming Language
Go
Used for developing Uber's microservices and implementing garbage collection optimizations.
Library
Gogctuner
A library developed to automate the tuning of garbage collection parameters in Go services.

Key Actionable Insights

1
Implement dynamic GOGC tuning to adapt to varying memory demands across services.
Dynamic tuning allows services with diverse memory utilization to optimize their garbage collection, reducing CPU overhead and improving performance.
2
Utilize GOGCTuner to automate GC tuning processes.
By automating the tuning of GOGC, service owners can minimize manual intervention and reduce the risk of misconfiguration, leading to more stable performance.
3
Monitor GC intervals to assess the effectiveness of tuning.
Understanding the intervals between garbage collections can help identify when further optimizations are necessary, ensuring that services remain efficient under varying loads.

Common Pitfalls

1
Relying on fixed GOGC values can lead to inefficient memory usage and out-of-memory errors.
Fixed values do not account for the dynamic nature of microservices' memory requirements, which can vary significantly based on workload.
2
Ignoring the need for observability metrics can hinder effective GC tuning.
Without proper metrics, service owners may misinterpret memory utilization and fail to optimize GC settings effectively.

Related Concepts

Garbage Collection In Go
Microservices Architecture
Cloud-native Infrastructure
Performance Optimization Techniques