MySQL to MyRocks Migration in Uber’s Distributed Datastores

Shriniket Kale, Hao Xu, Shenglin Du
9 min readintermediate
--
View Original

Overview

This article discusses Uber's migration from MySQL to MyRocks, a storage engine that integrates with RocksDB, to address disk space bottlenecks and improve operational efficiency. It covers the motivations behind the migration, the challenges faced, and the strategies implemented to ensure a smooth transition while maintaining performance and data integrity.

What You'll Learn

1

How to effectively migrate a large-scale database from MySQL to MyRocks

2

Why MyRocks is beneficial for reducing disk space in distributed databases

3

When to consider upgrading to MySQL 8.0 before migrating to MyRocks

Prerequisites & Requirements

  • Understanding of MySQL and distributed database concepts
  • Familiarity with XtraBackup for data resync and backup(optional)

Key Questions Answered

What were the main motivations for Uber's migration to MyRocks?
Uber migrated to MyRocks to address disk space bottlenecks and improve operational efficiency as their Schemaless and Docstore databases were growing rapidly, serving tens of millions of requests per second. The migration aimed to reduce the total cost of ownership by optimizing storage and resource utilization.
What challenges did Uber face during the MyRocks migration?
Uber encountered challenges such as ensuring no impact on downstream services during migration, maintaining acceptable read performance, and addressing MyRocks limitations. They also had to manage a large-scale upgrade to MySQL 8.0 before migrating, which involved significant planning and testing.
How did Uber ensure data integrity during the migration process?
Uber implemented a detailed automatic migration strategy that included validating data integrity and monitoring latency and error rates after each node migration. This approach ensured that the MyRocks nodes provided the same data consistency and query performance as the previous InnoDB nodes.
What performance improvements were observed after migrating to MyRocks?
After migrating to MyRocks, Uber observed over 30% disk space savings across their databases. However, they noted that some use cases resulted in higher CPU usage and increased disk I/O utilization, indicating that not all workloads were ideal candidates for MyRocks.

Key Statistics & Figures

Disk space savings
over 30%
Observed after migrating to MyRocks across Uber's databases.
Data requests per second
tens of millions
Indicates the scale at which Uber's Schemaless and Docstore databases operate.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Mysql
Used as the underlying database engine for Uber's Schemaless and Docstore.
Database
Myrocks
Storage engine integrated with RocksDB, chosen for its efficiency in handling large datasets.
Tools
Xtrabackup
Used for data resync and backup during the migration process.

Key Actionable Insights

1
Implement an incremental migration strategy to minimize service disruption during database upgrades.
This approach allows for careful monitoring of performance and data integrity at each stage, ensuring that any issues can be addressed before proceeding further.
2
Consider the specific workload characteristics when choosing a database engine like MyRocks.
While MyRocks offers significant disk space savings, it may not be suitable for all workloads, particularly those that are sensitive to CPU and I/O performance.
3
Utilize tools like XtraBackup for efficient data resync and backup during the migration process.
XtraBackup has proven to be faster and more reliable than other methods, making it a preferred choice for large-scale migrations.

Common Pitfalls

1
Underestimating the complexity of migrating large-scale databases.
The migration process can be fraught with challenges, including ensuring data consistency and performance, which requires thorough planning and testing.
2
Assuming all workloads will benefit equally from MyRocks.
Some workloads may experience increased CPU usage and disk I/O, which can negate the benefits of disk space savings. It's crucial to evaluate each use case individually.

Related Concepts

Database Migration Strategies
Mysql Performance Optimization
Distributed Database Management