Migrating to Espresso

David Max

•

David Max

•11 min read•intermediate•

--

•View Original

AvroOracleSQL

Overview

The article discusses the migration of LinkedIn's internal service, Babylonia, from Oracle to Espresso, a distributed NoSQL database. It outlines the challenges faced during the migration process, the strategies employed to ensure uninterrupted service, and the eventual transition to using Espresso as the new Source of Truth.

What You'll Learn

1

How to migrate a service from Oracle to Espresso without downtime

2

Why maintaining data consistency is crucial during database migration

3

How to implement shadow read validation to ensure data accuracy

Prerequisites & Requirements

Understanding of database migration concepts
Experience with NoSQL databases(optional)

Key Questions Answered

What is Espresso and why is it used at LinkedIn?

Espresso is LinkedIn's strategic distributed, fault-tolerant NoSQL database that supports many services. It has a large footprint with nearly a hundred clusters, storing about 420 terabytes of Source of Truth data and handling over two million queries per second at peak load.

How did LinkedIn ensure uninterrupted service during the migration?

LinkedIn maintained uninterrupted service during the migration by implementing a Databus listener to mirror changes from Oracle to Espresso in real-time and using shadow reads to validate data consistency between the two databases.

What challenges were faced during the migration from Oracle to Espresso?

The migration faced challenges such as ensuring data consistency, managing dual writes to both databases, and the need to clean up legacy code that directly accessed Oracle. These issues required careful planning and execution to avoid service interruptions.

What is the role of the MigrationControl field in Espresso?

The MigrationControl field in Espresso helps manage data writes from multiple sources, allowing the system to identify which process (bulk loader, Databus listener, or Babylonia) wrote the data last, thus preventing conflicts and ensuring data integrity.

Key Statistics & Figures

Number of clusters in use

close to a hundred

Espresso has a significant production footprint at LinkedIn.

Data stored

about 420 terabytes

This data represents the Source of Truth for LinkedIn services.

Queries handled per second at peak load

more than two million

This illustrates Espresso's capacity to manage high traffic.

Technologies & Tools

Database

Espresso

Used as the new distributed NoSQL database to replace Oracle.

Database

Oracle

The original database system from which Babylonia was migrated.

Data Replication

Databus

Used to distribute database changes to listeners in near-real time.

Key Actionable Insights

1
Implement shadow read validation to ensure data accuracy during migrations.
Shadow read validation allows you to compare results from both the old and new databases, helping to identify discrepancies and ensure data integrity before fully transitioning to the new system.

2
Utilize a Databus listener to maintain real-time synchronization between databases during migration.
This approach minimizes the risk of data loss and ensures that the new database reflects the most current state of the data, which is crucial for maintaining service continuity.

3
Conduct a thorough cleanup of legacy code before migration.
By eliminating deprecated code and ensuring that all database interactions go through a unified API, you can simplify the migration process and reduce potential issues related to tightly-coupled components.

Common Pitfalls

1

Failing to maintain data consistency during the migration process.

Inconsistent data can lead to application errors and user dissatisfaction. Implementing shadow reads and a Databus listener can help mitigate this risk.

2

Neglecting to clean up legacy code before migration.

Legacy code can complicate the migration process and introduce bugs. It's essential to refactor or eliminate outdated code to streamline the transition.

Related Concepts

Database Migration Strategies

Nosql Vs. Relational Databases

Real-time Data Synchronization Techniques