Moving persistent data out of Redis

Bryana Knight

Historically, we have used Redis in two ways at GitHub: We used it as an LRU cache to conveniently store the results of expensive computations over data originally persisted in…

GitHub

•

Bryana Knight

•7 min read•intermediate•

--

•View Original

GitJSONMySQLRedis

Overview

The article discusses GitHub's transition from using Redis as a persistent datastore to relying on MySQL for data persistence. It outlines the motivations behind this decision, the challenges faced during migration, and the performance improvements achieved through this transition.

What You'll Learn

1

How to transition from Redis to MySQL for data persistence

2

Why to consider operational costs when choosing a datastore

3

How to optimize write operations in a high-traffic environment

Prerequisites & Requirements

Understanding of Redis and MySQL
Experience with data migration strategies(optional)

Key Questions Answered

What were the main reasons for GitHub to stop using Redis for persistence?

GitHub decided to disable persistence in Redis to reduce operational costs, leverage their expertise in MySQL, and eliminate I/O latency during data writes. This strategic shift aimed to simplify their infrastructure while improving performance.

How did GitHub handle the migration of activity feeds from Redis to MySQL?

GitHub approached the migration by first analyzing the volume of writes and reads for different activity feeds. They optimized write operations by batching events and filtering them on read, which significantly reduced the number of writes and allowed for a smoother transition to MySQL.

What performance improvements were observed after migrating from Redis to MySQL?

After migrating, GitHub saw a 65% reduction in write operations for the organization timeline, leading to an overall write rate of less than 1500 keys per second. This performance was manageable within their existing MySQL infrastructure without requiring additional servers.

What challenges did GitHub face during the migration process?

One of the biggest challenges was managing the high volume of write operations to user feeds, which required innovative strategies to reduce the load on MySQL while ensuring data integrity and user experience during the transition.

Key Statistics & Figures

Percentage reduction in write operations for organization timeline

65%

This reduction was achieved by optimizing how events were dispatched to user feeds.

Total writes per day to Redis for activity feeds

350 million

This volume highlighted the need for a more efficient data handling strategy.

Peak write operations after migration

below 270 writes per second

This was a significant decrease compared to the previous load on Redis.

Replication delay at peak

below 180 milliseconds

This delay was monitored to ensure data consistency across replicas during high write operations.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database

Redis

Initially used for caching and data persistence before migration.

Database

Mysql

Adopted as the primary datastore for persistent data after discontinuing Redis persistence.

Key Actionable Insights

1
Evaluate the operational costs and performance needs when selecting a datastore.
Understanding the trade-offs between different datastores can help in making informed decisions that align with your application's requirements and budget.

2
Implement batching and throttling techniques to optimize write operations.
By reducing the frequency of writes through batching, you can significantly decrease the load on your database, improving overall system performance.

3
Leverage existing expertise in your team when choosing technologies.
Utilizing familiar technologies like MySQL can lead to better maintenance and performance, especially if your team has significant experience with them.

Common Pitfalls

1

Underestimating the complexity of migrating data between different storage systems.

Many teams may overlook the need for thorough planning and testing when transitioning to a new datastore, which can lead to performance issues and data integrity problems.