How a one line change decreased our clone times by 99%

Pinterest Engineering
4 min readbeginner
--
View Original

Overview

The article discusses a significant improvement made by the Engineering Productivity team at Pinterest, where a simple change in the Git fetch command reduced clone times by 99%. By setting the refspec option, the team was able to streamline their continuous integration pipelines, drastically improving build times.

What You'll Learn

1

How to reduce Git clone times using the refspec option

2

Why optimizing Git operations can improve CI/CD pipeline efficiency

3

When to implement shallow clones in Git for better performance

Prerequisites & Requirements

  • Basic understanding of Git operations and CI/CD pipelines

Key Questions Answered

How did Pinterest reduce their Git clone times by 99%?
Pinterest achieved a 99% reduction in Git clone times by adding the refspec option during the git fetch command. This change allowed them to limit the refs fetched to only the necessary branches, significantly speeding up the cloning process.
What is the impact of using the refspec option in Git?
Using the refspec option in Git allows developers to specify which references to fetch, thus avoiding unnecessary data transfer. This optimization can lead to substantial time savings, as seen in Pinterest's case where cloning their largest repo went from 40 minutes to just 30 seconds.
What are the typical clone times for Pinterest's largest monorepo?
The typical clone time for Pinterest's largest monorepo, Pinboard, was reduced from 40 minutes to 30 seconds after implementing the refspec option. This drastic improvement highlights the efficiency gained from small changes in configuration.

Key Statistics & Figures

Clone time for Pinboard repo
40 minutes to 30 seconds
This statistic reflects the improvement achieved by implementing the refspec option in their Git fetch command.
Reduction in clone times
99%
This percentage indicates the significant efficiency gain from the one line change in the pipeline configuration.
Number of git pulls on business days
60K
This figure illustrates the frequency of Git operations performed by Pinterest, emphasizing the importance of optimizing these processes.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Version Control
Git
Used for managing the source code and optimizing cloning operations.
CI/CD
Jenkins
Utilized for continuous integration and deployment processes at Pinterest.

Key Actionable Insights

1
Implement the refspec option in your Git fetch commands to optimize clone times.
This change can lead to significant performance improvements in CI/CD pipelines, especially for large repositories with extensive histories.
2
Regularly review and optimize your CI/CD pipeline configurations.
Even minor adjustments can yield substantial benefits, as demonstrated by Pinterest's experience with Git operations.
3
Educate your team about the impact of Git operations on development speed.
Understanding how Git fetch and clone operations work can empower developers to make informed decisions that enhance productivity.

Common Pitfalls

1
Failing to set the refspec option can lead to unnecessary data fetching.
Without this optimization, developers may experience longer clone times and slower CI/CD pipeline performance, especially with large repositories.

Related Concepts

Git Optimization Techniques
Continuous Integration Best Practices
Performance Improvements In Software Development