Project Mezzanine: The Great Migration

Rene Schmidt

Uber

•

Rene Schmidt

•10 min read•advanced•

--

•View Original

JSONMySQLPostgreSQLSQLSQLAlchemy

Overview

The article discusses Uber's Project Mezzanine, which involved migrating hundreds of millions of rows of trip data and over 100 services to a new data storage solution while maintaining service continuity. It highlights the challenges faced, the architectural changes made, and the lessons learned during this significant engineering effort.

What You'll Learn

1

How to migrate a large-scale data storage system with minimal downtime

2

Why using UUIDs is beneficial for scalability in data systems

3

How to implement a sharded, append-only data model effectively

Prerequisites & Requirements

Understanding of database management and data modeling concepts
Experience with Python and SQL databases(optional)

Key Questions Answered

What were the main challenges faced during the data migration?

The main challenges included migrating hundreds of millions of rows of data while keeping Uber's services operational. The team had to ensure high write availability and minimal downtime, which required careful planning and execution.

How did Uber ensure data consistency during the migration?

Uber mirrored writes to both PostgreSQL and the new Schemaless system, allowing for real-time validation of data consistency. This approach enabled the team to verify that data in Schemaless matched that in PostgreSQL without causing downtime.

What architectural changes were made in the new trip store?

The new trip store adopted a column-oriented, schemaless approach that organized data in JSON blobs indexed by trip UUIDs. This design facilitated horizontal scaling and allowed for easy addition of new fields without reconfiguration.

What lessons did the team learn from the migration process?

Key lessons included the importance of using UUIDs for scalability, keeping the data layer simple for easier debugging, and the necessity of a positive team attitude during large projects. The team also recognized the value of rapid iteration and validation.

Key Statistics & Figures

Number of services migrated

over 100

This migration was part of the broader Project Mezzanine effort to enhance data handling capabilities.

Growth rate of trips

20% a month

This growth rate necessitated the migration to a more robust data storage solution.

Duration of the final migration phase

6 weeks

The final crunch involved intensive collaboration and effort to complete the migration successfully.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database

Postgresql

Originally used for storing trip data before the migration to Schemaless.

Database

Mysql

Used in the new Schemaless architecture for data storage.

Programming Language

Python

The primary language used for developing the Schemaless system.

Key Actionable Insights

1
Implementing UUIDs for identifiers can greatly enhance scalability and reduce future migration headaches.
As seen in the project, transitioning from integer IDs to UUIDs can be complex and time-consuming. Starting with UUIDs can prevent similar issues as systems grow.

2
Adopting a sharded, append-only data model can improve data integrity and performance.
This model allows for safer data handling and easier scaling, as demonstrated by the successful implementation in the Mezzanine project.

3
Continuous validation during data migration is crucial for maintaining consistency.
By mirroring writes and validating data in real-time, Uber ensured that their migration did not disrupt services, which is a best practice for future migrations.

Common Pitfalls

1

Failing to plan for data model changes can lead to significant migration challenges.

As seen in the project, the need to change trip IDs to UUIDs required extensive rewrites of SQL queries, highlighting the importance of forward-thinking in data modeling.

Related Concepts

Data Migration Strategies

Database Scalability Techniques

Nosql Vs SQL Database Architectures