How to Build an Experiment Pipeline from Scratch

A guide to building an email experimentation pipeline from the ground up. You'll learn how to implement a similar pipeline with a relatively simple setup from scratch.

Mojan Benham
10 min readintermediate
--
View Original

Overview

This article outlines the process of building an email experimentation pipeline from scratch, addressing the challenges faced by Shopify's data teams in conducting A/B tests for external channels. It provides a step-by-step guide on understanding the problem, planning the solution, and implementing the pipeline effectively.

What You'll Learn

1

How to build an email experimentation pipeline from scratch

2

Why requirement gathering is crucial before starting a project

3

How to ensure experiment subjects are properly randomized and tracked

Prerequisites & Requirements

  • Understanding of A/B testing and experimentation frameworks
  • Familiarity with SQL and data warehousing concepts(optional)
  • Experience in data science or engineering roles

Key Questions Answered

What challenges did Shopify face with email experimentation?
Shopify faced issues with local storage of experiment data, ad hoc randomization that didn't account for user unsubscriptions, and simultaneous testing by multiple marketers without exclusion criteria. These challenges highlighted the need for a robust email experimentation pipeline.
How can you create a system diagram for an experimentation pipeline?
To create a system diagram, outline how your solution interacts with its environment, focusing on inputs and outputs. For Shopify, data sources included their data warehouse and email platform, which guided the design of the pipeline.
What are the key steps in building an email experimentation pipeline?
The key steps include understanding the problem, drawing a system diagram, planning the ideal output, technical planning, and finally building the pipeline using incremental implementation methods.
Why is it important to share new tools across the organization?
Sharing new tools ensures that all relevant teams are aware of available resources, preventing the use of outdated methods like local files. Effective communication can enhance collaboration and streamline processes across departments.

Technologies & Tools

Backend
Pyspark
Used to build the jobs for the email experimentation pipeline.

Key Actionable Insights

1
Prioritize requirement gathering before starting any project to avoid pitfalls later.
Understanding the problem thoroughly helps in designing a solution that meets all stakeholders' needs and reduces the risk of major revisions after implementation.
2
Create a high-level system diagram to visualize interactions and data flow.
This diagram serves as a roadmap for your project, helping to clarify inputs and outputs while preventing premature solutions.
3
Implement the pipeline in small, manageable increments to facilitate code reviews.
Smaller pull requests allow for focused feedback and make it easier to identify and fix issues during the development process.

Common Pitfalls

1
Failing to share new tools across the organization can lead to continued use of outdated methods.
This often happens when communication is lacking, resulting in teams unaware of new resources that could improve their workflows.