Moving persistent data out of Redis

Historically, we have used Redis in two ways at GitHub: We used it as an LRU cache to conveniently store the results of expensive computations over data originally persisted in…

Bryana Knight
7 min readintermediate
--
View Original

Overview

The article discusses GitHub's transition from using Redis as a persistent datastore to relying on MySQL for data persistence. It outlines the motivations behind this decision, the challenges faced during migration, and the performance improvements achieved through this transition.

What You'll Learn

1

How to transition from Redis to MySQL for data persistence

2

Why to consider operational costs when choosing a datastore

3

How to optimize write operations in a high-traffic environment

Prerequisites & Requirements

  • Understanding of Redis and MySQL
  • Experience with data migration strategies(optional)

Key Questions Answered

What were the main reasons for GitHub to stop using Redis for persistence?
GitHub decided to disable persistence in Redis to reduce operational costs, leverage their expertise in MySQL, and eliminate I/O latency during data writes. This strategic shift aimed to simplify their infrastructure while improving performance.
How did GitHub handle the migration of activity feeds from Redis to MySQL?
GitHub approached the migration by first analyzing the volume of writes and reads for different activity feeds. They optimized write operations by batching events and filtering them on read, which significantly reduced the number of writes and allowed for a smoother transition to MySQL.
What performance improvements were observed after migrating from Redis to MySQL?
After migrating, GitHub saw a 65% reduction in write operations for the organization timeline, leading to an overall write rate of less than 1500 keys per second. This performance was manageable within their existing MySQL infrastructure without requiring additional servers.
What challenges did GitHub face during the migration process?
One of the biggest challenges was managing the high volume of write operations to user feeds, which required innovative strategies to reduce the load on MySQL while ensuring data integrity and user experience during the transition.

Key Statistics & Figures

Percentage reduction in write operations for organization timeline
65%
This reduction was achieved by optimizing how events were dispatched to user feeds.
Total writes per day to Redis for activity feeds
350 million
This volume highlighted the need for a more efficient data handling strategy.
Peak write operations after migration
below 270 writes per second
This was a significant decrease compared to the previous load on Redis.
Replication delay at peak
below 180 milliseconds
This delay was monitored to ensure data consistency across replicas during high write operations.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Evaluate the operational costs and performance needs when selecting a datastore.
Understanding the trade-offs between different datastores can help in making informed decisions that align with your application's requirements and budget.
2
Implement batching and throttling techniques to optimize write operations.
By reducing the frequency of writes through batching, you can significantly decrease the load on your database, improving overall system performance.
3
Leverage existing expertise in your team when choosing technologies.
Utilizing familiar technologies like MySQL can lead to better maintenance and performance, especially if your team has significant experience with them.

Common Pitfalls

1
Underestimating the complexity of migrating data between different storage systems.
Many teams may overlook the need for thorough planning and testing when transitioning to a new datastore, which can lead to performance issues and data integrity problems.