Scaling merge-ort across GitHub

GitHub switched to performing merges and rebases using merge-ort. Come behind the scenes to see why and how we made this change.

Matt Cooper
5 min readbeginner
--
View Original

Overview

The article discusses how GitHub has scaled the merge-ort strategy across its platform to enhance merge and rebase performance. By adopting this new strategy, GitHub aims to improve speed and correctness in handling merges, ultimately benefiting users and reducing backend resource consumption.

What You'll Learn

1

How to implement the merge-ort strategy for merges in GitHub

2

Why merge-ort is preferred over libgit2 for performance improvements

3

When to utilize git-replay for rebases in Git

Prerequisites & Requirements

  • Understanding of Git merge strategies
  • Familiarity with Git and its command line interface

Key Questions Answered

What are the main requirements for a merge strategy at GitHub?
GitHub's merge strategy must be fast, correct, and not require checking out the repository. Speed is crucial due to the high volume of merges, correctness aligns with user expectations, and avoiding a working directory enhances scalability and security.
How does merge-ort improve performance compared to previous strategies?
merge-ort is significantly faster than the previous libgit2-based strategy, achieving a 10x speedup in average cases and nearly a 5x boost in P99 cases for large repositories. This improvement allows GitHub to handle merges more efficiently.
What results were observed when implementing merge-ort for rebases?
The implementation of merge-ort for rebases demonstrated a dramatic reduction in resource usage, with rebases taking under 10 minutes compared to 512 hours if done with libgit2. This showcases the efficiency of merge-ort in handling large volumes of rebases.
What is the process for deploying merge-ort at GitHub?
GitHub deployed merge-ort in two phases: first for merges and then for rebases. They utilized the Scientist framework to compare performance and correctness between the old and new implementations during the rollout.

Key Statistics & Figures

Speedup in average merge time
10x
Observed in the github/github monolith after implementing merge-ort for merges.
P99 speedup for merges
5x
Demonstrated during the performance comparison of merge-ort.
Time taken for rebases with libgit2
512 hours
This was the estimated time for rebases compared to under 10 minutes with merge-ort.

Technologies & Tools

Backend
Merge-ort
Used as the new merge strategy to improve performance and correctness in GitHub's merging processes.
Backend
Git-replay
A new Git subcommand used for performing rebases without needing a worktree.
Backend
Libgit2
Previously used strategy for merges and rebases before transitioning to merge-ort.
Testing
Scientist
Used to compare the performance and correctness of the old and new code paths during the rollout.

Key Actionable Insights

1
Adopt the merge-ort strategy for handling merges in Git to enhance performance.
This strategy has shown to provide significant speed improvements, especially for large repositories, making it a valuable upgrade for teams looking to optimize their workflows.
2
Utilize git-replay for rebasing workflows to avoid the overhead of a working directory.
This tool allows for efficient rebasing without the need for a separate worktree, streamlining the process and reducing resource consumption.
3
Implement a phased rollout strategy using tools like Scientist to measure the impact of new features.
This approach minimizes risk while providing valuable data on performance and correctness, ensuring that any new implementation meets user expectations.

Common Pitfalls

1
Failing to meet user expectations in merge correctness can lead to support tickets.
This issue arose when GitHub's previous implementation could not merge files that local Git could, highlighting the importance of aligning with user expectations.
2
Not measuring the performance impact of new implementations can result in unforeseen issues.
Without tools like Scientist, GitHub could not have effectively compared the performance and correctness of merge-ort against previous strategies.

Related Concepts

Git Merge Strategies
Performance Optimization In Software Engineering
Scalability In Backend Systems