Dynamic Kubernetes Cluster Scaling at Airbnb

Authors: Evan Sheng, David Morrison

David Morrison
11 min readintermediate
--
View Original

Overview

The article discusses how Airbnb has implemented dynamic Kubernetes cluster scaling to optimize cloud spending in response to fluctuating traffic demands. It outlines the evolution of their Kubernetes setup through three stages and highlights improvements made to the Cluster Autoscaler, including a custom gRPC expander to enhance flexibility and efficiency.

What You'll Learn

1

How to implement dynamic scaling for Kubernetes clusters using Cluster Autoscaler

2

Why heterogeneous clusters can improve resource utilization and reduce costs

3

How to create a custom gRPC expander for Kubernetes Cluster Autoscaler

Prerequisites & Requirements

  • Understanding of Kubernetes and cloud infrastructure concepts
  • Familiarity with Kubernetes Cluster Autoscaler(optional)

Key Questions Answered

How does Airbnb dynamically scale its Kubernetes clusters?
Airbnb uses the Kubernetes Cluster Autoscaler to automatically adjust cluster sizes based on pod requests. If a cluster's capacity is exhausted, the Autoscaler adds nodes, and if nodes are underutilized, it removes them, optimizing resource usage and reducing costs.
What improvements did Airbnb make to the Cluster Autoscaler?
Airbnb developed a custom gRPC expander for the Cluster Autoscaler to enhance its flexibility in scaling decisions. This allows for more sophisticated scaling strategies tailored to Airbnb's business needs, such as weighted priority expansions.
What are the stages of Kubernetes cluster evolution at Airbnb?
Airbnb's Kubernetes cluster evolution includes three stages: Stage 1 with homogenous clusters and manual scaling, Stage 2 with multiple cluster types that are independently autoscaled, and Stage 3 with heterogeneous clusters that are also autoscaled, improving efficiency and management.
What operational concerns did Airbnb face with the Cluster Autoscaler?
Airbnb encountered challenges with the Cluster Autoscaler's default expanders, which were not sophisticated enough for their specific business logic. This led to the need for a more flexible custom expander that could adapt to changing requirements.

Key Statistics & Figures

Cost savings from Cluster Autoscaler
5%
This savings was achieved by automating the scaling of clusters, reducing the need for manual intervention.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implementing a custom gRPC expander can significantly enhance the flexibility of your Kubernetes scaling strategy.
By developing a custom expander, you can tailor scaling decisions to meet specific business needs, allowing for more efficient resource utilization and cost management.
2
Transitioning to heterogeneous clusters can reduce management overhead and improve resource efficiency.
Consolidating multiple cluster types into fewer heterogeneous clusters simplifies upgrades and testing, leading to better overall performance and lower operational costs.
3
Utilizing the Cluster Autoscaler effectively can lead to substantial cost savings.
Airbnb reported saving approximately 5% of their total cloud spend by automating cluster scaling, demonstrating the financial benefits of efficient resource management.

Common Pitfalls

1
Relying solely on default expanders in Cluster Autoscaler can limit scaling flexibility.
Default expanders may not meet specific business needs, leading to inefficiencies. Custom solutions should be considered to address unique scaling requirements.

Related Concepts

Kubernetes Cluster Management
Container Orchestration
Cloud Cost Optimization
Autoscaling Strategies