MySQL infrastructure testing automation at GitHub

Our MySQL infrastructure is a critical component to GitHub. MySQL serves GitHub.com, GitHub’s API, authentication and more. Every git request touches MySQL in some way. We are tasked with keeping…

GitHub Engineering
15 min readadvanced
--
View Original

Overview

The article discusses the automation of MySQL infrastructure testing at GitHub, highlighting the importance of data integrity and availability. It details the processes for backups, failovers, and schema migrations, emphasizing the need for continuous testing to ensure system reliability.

What You'll Learn

1

How to automate MySQL backups using Percona Xtrabackup

2

Why continuous testing is essential for maintaining MySQL infrastructure integrity

3

How to implement automated failover processes with Orchestrator

4

When to use delayed replicas as a safeguard for data recovery

Prerequisites & Requirements

  • Understanding of MySQL database management and backup strategies
  • Familiarity with Percona Xtrabackup and Orchestrator(optional)

Key Questions Answered

What tools does GitHub use for MySQL backups?
GitHub uses Percona Xtrabackup for full backups and runs logical backups several times a day to ensure data availability for engineers. This allows for testing changes on production-sized tables and restoring backed-up tables through Hubot.
How does GitHub automate failovers in MySQL?
GitHub employs Orchestrator to automate failovers for master and intermediate masters. It detects failures, promotes replicas, and heals the topology, ensuring minimal downtime and maintaining service continuity.
What is the purpose of delayed replicas in GitHub's MySQL infrastructure?
Delayed replicas serve as a safeguard for data recovery, allowing GitHub to revert to a point before an erroneous query was executed. This mechanism helps in quickly addressing accidental data changes.
How does GitHub perform schema migrations?
GitHub uses gh-ost for live schema migrations, which allows for data copying to a ghost table while the original table remains in use. This minimizes downtime and ensures data integrity during migrations.

Technologies & Tools

Backup
Percona Xtrabackup
Used for issuing full backups of MySQL databases.
Infrastructure Management
Orchestrator
Automates failovers for MySQL masters and intermediate masters.
Schema Migration
Gh-ost
Facilitates live schema migrations by copying data to a ghost table.

Key Actionable Insights

1
Implementing automated backup verification can significantly enhance data reliability.
By setting up a dedicated host to run restores of the latest backups, you can ensure that backups are valid and retrievable, which is crucial for disaster recovery.
2
Regularly testing failover processes in a production-like environment builds trust in your infrastructure.
By simulating failures and observing the automated responses, you can identify potential weaknesses and improve the overall resilience of your MySQL setup.
3
Using tools like gh-ost for schema migrations allows for seamless updates without impacting production traffic.
This approach ensures that migrations can be tested in a safe environment, reducing the risk of data corruption during live updates.

Common Pitfalls

1
Failing to regularly test backup and restore processes can lead to data loss in emergencies.
Without routine verification, backups may become corrupted or unusable, leaving the organization vulnerable during a disaster.
2
Assuming automated failover processes will always work without testing can lead to unexpected downtime.
It's essential to continuously test failover mechanisms in a controlled environment to ensure they function as expected during actual failures.