Cost Efficient Snowflake CI

How Ramp tackled runaway Snowflake costs.

Kevin Chao
8 min readadvanced
--
View Original

Overview

The article discusses how Ramp's data team implemented cost-efficient Continuous Integration (CI) strategies using Snowflake to manage rising cloud service costs. It highlights the importance of financial operations in data engineering and details the targeted cloning strategies that optimize resource usage during CI checks.

What You'll Learn

1

How to implement targeted cloning strategies in Snowflake for CI checks

2

Why financial operations should be a shared responsibility in data teams

3

How to utilize dbt artifacts to optimize CI processes

4

When to apply zero-copy clones in Snowflake to reduce costs

Prerequisites & Requirements

  • Basic understanding of Continuous Integration and cloud data warehouses
  • Familiarity with dbt and Snowflake(optional)

Key Questions Answered

How does Ramp handle CI checks in Snowflake?
Ramp's data team creates a database isolated for each pull request, runs dbt to build models, tests values against expectations, and tears down the database after the pull request is merged or closed. This process helps simulate a production environment closely and manage costs effectively.
What challenges did Ramp face with rising CI costs?
As Ramp's data team grew and the number of pull requests increased, they faced rising costs due to unnecessary model builds and developer idle time. The simplistic approach of duplicating the production environment led to inefficiencies that needed to be addressed.
What solution did Ramp implement to reduce CI costs?
Ramp implemented targeted cloning strategies using dbt artifacts and Snowflake's zero-copy clones to minimize resource usage during CI checks. This allowed them to build only the modified models and their direct dependencies, significantly reducing costs.
How did storing dbt artifacts improve CI processes?
By storing the manifest.json file in an S3 bucket after each production run, Ramp was able to leverage the --state selector in dbt to identify changes against the previous version of their project, drastically reducing the number of models built during CI checks.

Key Statistics & Figures

Average cost reduction in CI checks
Dramatic dip in the rolling 7 day average cost of warehouses
This cost reduction was sustained even with an increased velocity of pull requests.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement targeted cloning strategies to optimize CI processes and reduce costs.
By focusing only on the modified models and their dependencies, teams can significantly decrease cloud service expenses while maintaining the integrity of their testing processes.
2
Leverage dbt artifacts to establish state and streamline CI checks.
Storing dbt artifacts allows teams to efficiently track changes and minimize unnecessary builds, which is crucial for managing costs in a cloud environment.
3
Utilize Snowflake's zero-copy clones to avoid building heavy upstream models.
This approach enables developers to debug issues in a single namespace, simplifying the process and reducing computational costs associated with CI.

Common Pitfalls

1
Building unnecessary models during CI checks can lead to increased cloud costs and developer idle time.
This often occurs when teams do not implement targeted strategies to focus on only modified models, resulting in wasted resources and time.

Related Concepts

Continuous Integration
Cloud Data Warehousing
Data Engineering Best Practices