•Banty Kumar, Debadarsini Nayak, Raja Sriram Ganesan, Amit Jain•15 min read•advanced•
--
•View OriginalOverview
The article discusses the MySQL fleet at Uber, which consists of over 2,300 independent clusters that support critical operations for the platform. It highlights the architecture, control plane operations, and improvements made to enhance MySQL availability from 99.9% to 99.99%.
What You'll Learn
1
How to manage MySQL clusters effectively at scale
2
Why MySQL control plane architecture is crucial for high availability
3
How to implement primary failover processes in MySQL
4
When to use automated schema changes in MySQL
Prerequisites & Requirements
- Understanding of MySQL architecture and operations
- Familiarity with Kubernetes and Docker(optional)
Key Questions Answered
How does Uber ensure high availability of its MySQL fleet?
Uber has improved MySQL fleet availability from 99.9% to 99.99% through various optimizations and a re-architecture of the control plane. This includes implementing automated workflows for primary failover and node management, ensuring minimal downtime and data loss.
What are the main components of the MySQL control plane at Uber?
The MySQL control plane at Uber consists of several components including the control plane, data plane, discovery plane, observability tools, and backup/restore mechanisms. Each component plays a critical role in managing the lifecycle and health of MySQL clusters.
What is the primary failover process in Uber's MySQL architecture?
The primary failover process involves automatically changing the primary node of a cluster from one host to another to maintain write availability. This process is critical for ensuring high availability and is monitored continuously for any degradation in the primary node's health.
How does Uber handle schema changes in its MySQL databases?
Uber automates schema changes through a self-serve workflow that utilizes MySQL's instant alter or Percona's pt-online-schema-change. This ensures safe, non-blocking updates while allowing for dry-run capabilities to verify compatibility before applying changes.
Key Statistics & Figures
Number of MySQL clusters at Uber
over 2,300
This extensive fleet supports a vast array of operations critical to Uber's platform.
Improvement in MySQL availability
from 99.9% to 99.99%
This enhancement was achieved through various optimizations and a re-architecture of the control plane.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Database
Mysql
Serves as the backbone of Uber's data infrastructure.
Orchestration
Kubernetes
Hosts stateless services that connect to MySQL clusters.
Containerization
Docker
Isolates components within MySQL nodes.
Streaming
Apache Kafka
Used for change data capture by streaming changes to a data store.
Backup
Percona Xtrabackup
Facilitates automated backup and restore processes.
Key Actionable Insights
1Implementing a robust control plane for MySQL can significantly enhance operational efficiency and reliability.By automating workflows for cluster management and failover processes, teams can reduce manual intervention and improve system resilience, which is crucial for high-availability applications.
2Utilizing a discovery plane simplifies client interactions with MySQL clusters.By abstracting the underlying hardware changes, the discovery plane allows services to connect seamlessly to their MySQL clusters, enhancing system flexibility and reducing downtime during maintenance.
3Regularly review and optimize your MySQL backup and restore processes.Ensuring that backup processes are automated and maintain a low RPO and RTO can protect against data loss and improve recovery times in case of failures.
Common Pitfalls
1
Tightly coupling the MySQL control plane with underlying infrastructure processes can lead to operational reliability issues.
As the MySQL fleet grows, this coupling can block infrastructure placement operations, making it difficult to manage workflows effectively.
Related Concepts
Mysql Architecture And Operations
High Availability Strategies
Automated Workflows In Database Management