How Uber Migrated Financial Data from DynamoDB to Docstore

Piyush Patel, Jaydeepkumar Chovatia, Kaushik Devarajaiah

Uber

•

Piyush Patel, Jaydeepkumar Chovatia, Kaushik Devarajaiah

•15 min read•advanced•

--

•View Original

ApacheApache KafkaApache SparkAWSDynamoDB

Overview

This article details Uber's migration of financial data from DynamoDB to Docstore, highlighting the challenges faced and the architectural decisions made to ensure data integrity and operational efficiency. It discusses the re-architecture of LedgerStore, the use of streaming for data processing, and the successful backfill of over 250 billion records.

What You'll Learn

1

How to migrate large-scale financial data from DynamoDB to Docstore

2

Why data sealing is crucial for maintaining data integrity

3

When to implement shadow writes for asynchronous data insertion

4

How to leverage streaming for efficient data processing

Prerequisites & Requirements

Understanding of database migration strategies
Familiarity with Apache Kafka and Spark(optional)

Key Questions Answered

What are the main challenges in migrating data from DynamoDB to Docstore?

The main challenges include maintaining data integrity during migration, ensuring high availability without downtime, and managing the complexity of migrating over 250 billion records while adhering to strict SLAs. Uber addressed these challenges through a phased approach and the use of shadow writes.

How does LedgerStore ensure data integrity during the migration process?

LedgerStore ensures data integrity through mechanisms like sealing, which closes time ranges for changes, and by generating manifests that provide a verifiable record of data. This allows for reproducible queries and detection of unauthorized changes.

What is the significance of the sealing process in LedgerStore?

The sealing process in LedgerStore is significant as it guarantees that any queries reading from sealed time ranges are reproducible. This process involves validating data through checksums and signatures, which helps maintain data integrity and correctness.

What are shadow writes and how do they work in the migration?

Shadow writes are an asynchronous method of writing data to a secondary database while the primary database is being updated. This approach minimizes latency and ensures that data is consistently written across both databases, with a mechanism in place for resynchronization in case of failures.

Key Statistics & Figures

Unique records migrated

250 billion

This figure represents the scale of data that was successfully migrated from DynamoDB to Docstore.

Data volume

~300TB

The total amount of data involved in the migration process.

Estimated yearly savings

$6 million

This is the projected cost reduction achieved by migrating from DynamoDB to Docstore.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database

Docstore

Used as the new backend for LedgerStore to improve efficiency and reduce costs.

Database

Dynamodb

The original backend that was replaced during the migration process.

Stream Processing

Apache Kafka

Utilized for streaming data to ensure efficient processing and data integrity.

Data Processing

Apache Spark

Employed for backfilling historical data efficiently.

Key Actionable Insights

1
Implement a phased migration strategy to minimize risks during database transitions.
By breaking down the migration into manageable phases, Uber was able to address complexities and ensure system stability, which is crucial for maintaining service availability.

2
Utilize data sealing to enhance data integrity in financial systems.
Sealing helps ensure that once data is written, it cannot be altered without detection, which is vital for compliance in financial transactions.

3
Leverage streaming technologies for efficient data processing and backfilling.
Using streaming allows for real-time data processing and reduces the load on primary databases, making it easier to handle large volumes of data.

Common Pitfalls

1

Failing to maintain data consistency during migration can lead to significant issues.

This can occur if both databases are not kept in sync, resulting in discrepancies that could affect business operations. Implementing shadow writes and a dual read strategy helps mitigate this risk.

Related Concepts

Database Migration Strategies

Data Integrity And Compliance

Streaming Data Processing