Differential Backups in MyRocks Based Distributed Databases at Uber

Adithya Reddy, Shriniket Kale
15 min readadvanced
--
View Original

Overview

The article discusses Uber's implementation of differential backups for MyRocks-based distributed databases, addressing the challenges and solutions related to backup efficiency and cost. It highlights the transition from traditional full backups to a more efficient differential backup strategy, significantly reducing storage costs and improving backup speeds.

What You'll Learn

1

How to implement differential backups in MyRocks databases

2

Why differential backups are more efficient than full backups

3

When to use full backups versus differential backups

Prerequisites & Requirements

  • Understanding of MySQL and backup strategies
  • Familiarity with Percona XtraBackup

Key Questions Answered

What are the benefits of using differential backups over full backups?
Differential backups significantly reduce storage costs by only saving newly created or altered SSTable files, which minimizes data redundancy. This approach has led to a 45% reduction in data storage across most instances, with some larger instances achieving reductions of 70% or more.
How does Uber handle backup efficiency for MyRocks databases?
Uber's backup strategy leverages the immutability of SSTable files, enabling the reuse of unchanged files across backups. This method reduces unnecessary data duplication, speeds up backup and restore processes, and optimizes storage usage.
What challenges did Uber face with MyRocks backups?
The main challenges included the lack of support for incremental backups in MyRocks, leading to increased costs and complexity in maintaining full backups. This resulted in storing hundreds of petabytes of full backups, incurring millions in blob store expenses.
What is the role of the backup manifest file in differential backups?
The backup manifest file tracks the specific SSTable files included in each backup, guiding the restoration process. It records essential details like backup size, success status, and paths for the files, ensuring accurate data reconstruction.

Key Statistics & Figures

Reduction in data storage
45%
Achieved across most instances through the implementation of differential backups.
Backup speed improvement
2X faster for full backups and 5X faster for differential backups
Compared to previous backup completion times.

Technologies & Tools

Database
Myrocks
Used as the storage engine for Uber's distributed databases.
Backup Tool
Percona Xtrabackup
Utilized for taking backups of MyRocks databases.

Key Actionable Insights

1
Implementing differential backups can drastically reduce storage costs and improve backup speeds.
By only saving changes since the last full backup, organizations can save significant amounts of storage space, as seen in Uber's implementation, which achieved a 45% reduction in blob store usage.
2
Utilizing a backup manifest file enhances the efficiency of the backup and restoration process.
This file serves as a reference for which files to retrieve during restoration, streamlining the recovery process and ensuring data integrity.
3
Regularly evaluate the performance of your backup strategy to adapt to changes in data volume and access patterns.
As data grows, the backup infrastructure must scale accordingly. Uber's experience shows that monitoring and adjusting backup strategies can prevent inefficiencies.

Common Pitfalls

1
Over-reliance on differential backups without considering partition activity can lead to inefficiencies.
In highly active partitions, the expected benefits of differential backups may diminish, necessitating a reevaluation of backup strategies.
2
Assuming the same partition node will always be suitable for backups can lead to unexpected full backup requirements.
If a node becomes unsuitable, a full backup is needed, which can increase costs and complexity.