We manage the build pipeline that delivers Quip and Slack Canvas’s backend. A year ago, we were chasing exciting ideas to help engineers ship better code, faster. But we had one huge problem: builds took 60 minutes. With a build that slow, the whole pipeline gets less agile, and feedback doesn’t come to engineers until…
Overview
Slack's build pipeline team reduced build times for Quip and Slack Canvas from 60 minutes to as little as 10 minutes by applying classic software engineering principles—separation of concerns, caching, parallelization, and layering—to their Bazel-based build system. The article draws parallels between code performance optimization (caching with functools, threading) and build system optimization, demonstrating how decoupling frontend and backend builds, increasing cache granularity, and delegating parallelization to Bazel dramatically improved developer experience.
What You'll Learn
How to apply code-level performance optimization principles (caching, parallelization) to build systems
Why separation of concerns between frontend and backend builds is critical for cache hit rates in Bazel
How to identify and fix layering violations where build scripts duplicate orchestration already handled by Bazel
How to design granular, composable build units that maximize caching and parallelization effectiveness
Why hermetic and idempotent build steps are prerequisites for effective Bazel caching
Prerequisites & Requirements
- Understanding of build systems and dependency graphs (directed acyclic graphs)
- Familiarity with caching concepts (cache keys, hit rates, hermeticity, idempotency)
- Basic familiarity with Bazel build system concepts (targets, srcs, outs)(optional)
- Understanding of Python concurrency patterns (functools.cache, ThreadPoolExecutor)(optional)
Key Questions Answered
How do you reduce build times from 60 minutes to 10 minutes with Bazel?
Why does coupling frontend and backend builds destroy cache effectiveness?
What is a layering violation in build systems and how do you fix it?
What properties must build steps have for Bazel caching to work?
How does build target granularity affect cache hit rates?
How did Slack verify correctness when rewriting their build system?
Why should you avoid custom parallelization inside Bazel build steps?
Key Statistics & Figures
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1Model your build as a directed acyclic graph with exhaustively defined inputs and outputs for each step. This enables the build system to automatically determine what needs rebuilding and what can be cached. Think of each build target like a pure function with declared parameters—the more precisely you define dependencies, the better caching and parallelization will work.This is the foundational principle that enables all other build optimizations. Without well-defined dependency edges, neither caching nor parallelization can be applied effectively by tools like Bazel.
2Increase the granularity of your build targets to improve cache hit rates. Instead of one monolithic target that takes all sources and produces all artifacts, break it into smaller targets that each handle a specific piece. This is directly analogous to caching at the per-item level rather than per-collection in application code.Slack's frontend builder originally took all TypeScript and CSS sources and produced all bundles. By splitting into per-bundle builds with independent TypeScript and CSS steps, they dramatically increased how often cached results could be reused.
3Audit and sever unnecessary transitive dependencies between major subsystems in your build graph. When your frontend build depends on your entire backend, every backend change invalidates the frontend cache. Map out the actual data flow to identify which dependency edges are truly required versus artifacts of historical coupling.Slack discovered that the dependency edge between their Python backend and TypeScript frontend was costing 35 minutes per build—more than half the total—because it forced full frontend rebuilds on any Python change.
4Remove custom parallelization and orchestration code from your build scripts when using a build system like Bazel that handles these concerns. Strip your build scripts down to pure business logic that transforms specific inputs into specific outputs, and let the build system handle scheduling, caching, and resource allocation.This avoids layering violations where your code and the build system compete for resources. It also makes build steps more composable and allows the build system to parallelize across machines, not just local cores.
5Build a comparison tool to validate correctness when migrating build systems, especially when the original build code lacks tests. Diff the artifacts produced by old and new systems to iteratively find and fix discrepancies, building confidence in the migration.Slack built a Rust tool for this purpose because the complexity of their original build code made it impossible to define correct behavior from first principles. The iterative comparison approach served as an effective substitute for unit tests.
6Rewrite build orchestration in a constrained language like Starlark rather than in your application language. The deliberate limitations of such languages enforce separation between build logic and application code, preventing the re-entanglement of concerns that caused the original problems.Slack's Python build scripts had deep dependencies on backend application code. Rewriting in Starlark and standard-library-only Python scripts enforced a clean boundary between build and application concerns.