Fast builds, secure builds. Choose two

Stripe's continuous integration system is responsible for orchestrating build pipelines and executing tens of thousands of test suites. To meet our exceptionally high security bar, we lean on a combination of open-source tech and novel engineering.

Sushain Cherivirala
11 min readadvanced
--
View Original

Overview

The article discusses the challenges and solutions Stripe engineers face in maintaining a continuous integration (CI) system that balances speed and security. It highlights the use of Bazel for managing build pipelines and the implementation of remote caching and execution to optimize performance while ensuring the integrity of builds.

What You'll Learn

1

How to implement Bazel for multi-language build pipelines

2

Why remote caching and execution are crucial for CI performance

3

How to use Firecracker for efficient microVM management

Prerequisites & Requirements

  • Understanding of CI/CD principles and practices
  • Familiarity with Bazel and its build rules(optional)

Key Questions Answered

How does Bazel improve build and test pipelines at Stripe?
Bazel provides a multi-language and multi-platform framework that allows Stripe engineers to define rules for building and testing code. This enables efficient management of complex build pipelines, which is essential for handling the diverse programming languages used at Stripe.
What are the benefits of remote caching and execution in CI?
Remote caching allows Bazel to reuse outputs from previous actions, significantly speeding up the build process. Remote execution distributes actions across multiple machines, which enhances performance and efficiency, enabling builds to complete in under five minutes.
What security measures are taken to protect the CI system?
To prevent action cache poisoning, only trusted sources are allowed to write to the remote cache. This ensures that only verified outputs from trusted executions are stored, safeguarding against malicious code being introduced into the CI system.
Why did Stripe move from gVisor to Firecracker for sandboxing?
Stripe transitioned to Firecracker due to performance issues with gVisor, particularly in workloads that involved numerous filesystem syscalls. Firecracker's microVMs offer faster startup times and reduced I/O overhead, making them more suitable for Stripe's diverse workloads.

Key Statistics & Figures

Actions executed for building Java codebase
upwards of two hundred thousand actions
This number highlights the scale at which Stripe operates and the complexity involved in their build processes.
Time for builds to complete
sub-5 minutes
Maintaining this performance benchmark is essential for keeping engineers productive.
Filesystem syscalls during Ruby unit tests
over 600,000 syscalls
This statistic illustrates the performance challenges faced when using gVisor for certain workloads.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Build Tool
Bazel
Used for managing build and test pipelines across multiple programming languages.
Virtualization
Firecracker
KVM-based microVM solution used for sandboxing build actions.
Sandboxing
Gvisor
Initially used for executing untrusted code in a secure environment.
Infrastructure As Code
Terraform
Used for defining infrastructure at Stripe.
Container Management
Containerd
Used to manage container images for executing actions.

Key Actionable Insights

1
Implementing Bazel can streamline your build processes across multiple languages.
By using Bazel, teams can define clear rules for building and testing, which can lead to faster development cycles and improved collaboration among engineers working with different programming languages.
2
Utilizing remote caching can drastically reduce build times.
By caching outputs from previous builds, teams can avoid redundant work, allowing for quicker feedback on changes and enhancing overall productivity.
3
Adopting Firecracker can enhance the performance of CI systems that require sandboxing.
Firecracker's microVMs provide a lightweight and efficient environment for executing builds, which can be particularly beneficial for applications with high I/O demands.

Common Pitfalls

1
Allowing untrusted code to write directly to the action cache can lead to security vulnerabilities.
This can result in malicious actors replacing critical binaries. To avoid this, ensure that only trusted sources are allowed to interact with the cache.

Related Concepts

Continuous Integration
Remote Caching
Microvms
Build Optimization Techniques