Cinnamon Auto-Tuner: Adaptive Concurrency in the Wild

Vladimir Gavrilenko, Jakob Holdgaard Thomsen, Jesper Lindstrom Nielsen, Timothy Smyth

Uber

•

Vladimir Gavrilenko, Jakob Holdgaard Thomsen, Jesper Lindstrom Nielsen, Timothy Smyth

•19 min read•advanced•

--

•View Original

SQL

Overview

The article discusses the Cinnamon Auto-Tuner, a system designed to adaptively manage concurrency in production environments. It highlights the challenges of estimating service capacity and how the implementation of the TCP-Vegas algorithm helps optimize request handling without manual tuning.

What You'll Learn

1

How to implement adaptive concurrency limiting using TCP-Vegas

2

Why accurate capacity estimation is crucial for service performance

3

When to apply the Auto-Tuner for optimal request handling

Prerequisites & Requirements

Understanding of concurrency control algorithms
Experience with microservices architecture(optional)

Key Questions Answered

How does the Auto-Tuner estimate the optimal inflight limit?

The Auto-Tuner continuously estimates the maximum number of concurrent requests for each endpoint based on observed latencies. It adjusts the inflight limit dynamically to optimize throughput without requiring manual tuning by service owners.

What challenges does adaptive concurrency limiting address?

Adaptive concurrency limiting addresses issues such as varying service capacities, fluctuating workloads, and the need for automatic adjustments to maintain optimal performance. It helps prevent services from becoming overloaded while maximizing resource utilization.

What is the role of TCP-Vegas in the Auto-Tuner?

TCP-Vegas is used to track request processing latencies and adjust the inflight limit based on the difference between observed and reference latencies. This helps maintain service performance under varying load conditions.

How does the Auto-Tuner handle overload situations?

In overload situations, the Auto-Tuner reduces the inflight limit to prevent overwhelming downstream services. This allows the system to manage increased latencies without causing service failures.

Key Statistics & Figures

Maximum concurrent requests handled by services

100s

Some services can handle hundreds of requests concurrently, while others may only manage one.

Inflight limit adjustment factor

10

The inflight limit is capped at 10 times the number of concurrently processed requests.

Technologies & Tools

Algorithm

Tcp-vegas

Used for adjusting inflight limits based on latency observations.

Key Actionable Insights

1
Implement the Auto-Tuner in your microservices to automate concurrency management.
This will reduce the need for manual tuning and ensure that your services adapt to changing loads effectively.

2
Utilize the TCP-Vegas algorithm for better latency management in your applications.
By tracking latencies and adjusting inflight limits, you can enhance the responsiveness of your services under varying traffic conditions.

3
Regularly monitor the performance metrics of your services to identify potential overload situations.
This proactive approach allows you to adjust configurations before issues escalate, maintaining service reliability.

Common Pitfalls

1

Relying on a single latency sample can lead to skewed results due to transient spikes.

To avoid this, aggregate multiple latency samples over time to obtain a more accurate representation of service performance.

Related Concepts

Concurrency Control Algorithms

Microservices Architecture

Performance Optimization Techniques