Overview
The article discusses the migration of LinkedIn's internal service, Babylonia, from Oracle to Espresso, a distributed NoSQL database. It outlines the challenges faced during the migration process, the strategies employed to ensure uninterrupted service, and the eventual transition to using Espresso as the new Source of Truth.
What You'll Learn
1
How to migrate a service from Oracle to Espresso without downtime
2
Why maintaining data consistency is crucial during database migration
3
How to implement shadow read validation to ensure data accuracy
Prerequisites & Requirements
- Understanding of database migration concepts
- Experience with NoSQL databases(optional)
Key Questions Answered
What is Espresso and why is it used at LinkedIn?
Espresso is LinkedIn's strategic distributed, fault-tolerant NoSQL database that supports many services. It has a large footprint with nearly a hundred clusters, storing about 420 terabytes of Source of Truth data and handling over two million queries per second at peak load.
How did LinkedIn ensure uninterrupted service during the migration?
LinkedIn maintained uninterrupted service during the migration by implementing a Databus listener to mirror changes from Oracle to Espresso in real-time and using shadow reads to validate data consistency between the two databases.
What challenges were faced during the migration from Oracle to Espresso?
The migration faced challenges such as ensuring data consistency, managing dual writes to both databases, and the need to clean up legacy code that directly accessed Oracle. These issues required careful planning and execution to avoid service interruptions.
What is the role of the MigrationControl field in Espresso?
The MigrationControl field in Espresso helps manage data writes from multiple sources, allowing the system to identify which process (bulk loader, Databus listener, or Babylonia) wrote the data last, thus preventing conflicts and ensuring data integrity.
Key Statistics & Figures
Number of clusters in use
close to a hundred
Espresso has a significant production footprint at LinkedIn.
Data stored
about 420 terabytes
This data represents the Source of Truth for LinkedIn services.
Queries handled per second at peak load
more than two million
This illustrates Espresso's capacity to manage high traffic.
Technologies & Tools
Database
Espresso
Used as the new distributed NoSQL database to replace Oracle.
Database
Oracle
The original database system from which Babylonia was migrated.
Data Replication
Databus
Used to distribute database changes to listeners in near-real time.
Key Actionable Insights
1Implement shadow read validation to ensure data accuracy during migrations.Shadow read validation allows you to compare results from both the old and new databases, helping to identify discrepancies and ensure data integrity before fully transitioning to the new system.
2Utilize a Databus listener to maintain real-time synchronization between databases during migration.This approach minimizes the risk of data loss and ensures that the new database reflects the most current state of the data, which is crucial for maintaining service continuity.
3Conduct a thorough cleanup of legacy code before migration.By eliminating deprecated code and ensuring that all database interactions go through a unified API, you can simplify the migration process and reduce potential issues related to tightly-coupled components.
Common Pitfalls
1
Failing to maintain data consistency during the migration process.
Inconsistent data can lead to application errors and user dissatisfaction. Implementing shadow reads and a Databus listener can help mitigate this risk.
2
Neglecting to clean up legacy code before migration.
Legacy code can complicate the migration process and introduce bugs. It's essential to refactor or eliminate outdated code to streamline the transition.
Related Concepts
Database Migration Strategies
Nosql Vs. Relational Databases
Real-time Data Synchronization Techniques