Scaling Airbnb’s Experimentation Platform

At Airbnb, we are constantly iterating on the user experience and product features. This can include changes to the look and feel of the…

Jonathan Parks
8 min readintermediate
--
View Original

Overview

This article discusses the scaling of Airbnb's Experimentation Reporting Framework (ERF), detailing its evolution from a simple Ruby script to a robust system utilizing Apache Airflow. It highlights the challenges faced, architectural changes made, and the introduction of features like metric hierarchies and dimensional cuts to enhance experimentation capabilities.

What You'll Learn

1

How to leverage Apache Airflow for orchestrating data pipelines

2

Why dimensional cuts are essential for analyzing metrics effectively

3

How to implement a metric hierarchy to improve UI clarity

Prerequisites & Requirements

  • Understanding of A/B testing and experimentation frameworks
  • Familiarity with Apache Airflow(optional)

Key Questions Answered

What were the main challenges faced by Airbnb's original ERF?
The original ERF faced several challenges, including inefficiencies from scanning source tables multiple times, a monolithic query structure that hindered checkpointing, and dependency checking that required all metric tables to be ready before processing. These issues led to a swamped Hadoop cluster and user dissatisfaction.
How did migrating to Airflow improve ERF's performance?
Migrating to Airflow allowed ERF to break down the processing into smaller, independent tasks, significantly reducing the runtime from over 24 hours to about 45 minutes. This change also resolved issues with dependency checking and improved overall scalability.
What is the significance of dimensional cuts in ERF?
Dimensional cuts enable users to analyze metrics by various attributes, such as geography or device type, allowing for more granular insights into experiment performance. This feature enhances the ability to understand user behavior and optimize experiments accordingly.
What are the core, target, and certified metrics in ERF?
Core metrics are mandatory for all experiments to ensure awareness of their impact, target metrics are prioritized for visibility in the UI, and certified metrics are audited by the Data Engineering team, guaranteeing an SLA. This hierarchy helps manage the complexity of metrics effectively.

Key Statistics & Figures

Concurrent experiments running in ERF
500
This number has grown from a few dozen in 2014, reflecting the platform's scaling success.
Distinct metrics computed per day
~2500
This is a significant increase from just a few dozen metrics initially, showcasing the platform's expanded capabilities.
Distinct experiment/metric combinations
50k
This figure illustrates the complexity and scale of the experimentation efforts at Airbnb.
Reduction in ERF runtime after migration to Airflow
from 24+ hours to about 45 minutes
This dramatic improvement highlights the efficiency gained through the architectural changes.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Orchestration
Apache Airflow
Used for constructing dynamic pipelines to compute ERF assignments and metrics.
Database
Hive
Initially used for executing queries in the original ERF implementation.

Key Actionable Insights

1
Implement a metric hierarchy in your experimentation framework to enhance clarity and usability.
As ERF saw an increase in metrics, introducing a hierarchy helped users focus on critical metrics and reduced UI overcrowding. This approach can be beneficial in any data-driven environment where clarity is essential.
2
Utilize dimensional cuts to gain deeper insights into user behavior during experiments.
By slicing metrics based on user attributes and event characteristics, teams can uncover trends and optimize their strategies. This practice is crucial for tailoring user experiences and improving product offerings.
3
Transition to a modular pipeline architecture using tools like Apache Airflow to improve data processing efficiency.
The shift from a monolithic to a modular approach allowed Airbnb to significantly reduce processing times and improve scalability. This strategy can be applied to any data-intensive application to enhance performance.

Common Pitfalls

1
Relying on monolithic queries can lead to inefficiencies and increased failure rates.
The original ERF's single huge queries caused repeated scans of source tables and made it difficult to recover from errors. Transitioning to smaller, modular tasks can mitigate these issues.
2
Overcrowding the UI with too many metrics can hinder usability.
As more metrics were added, users struggled to identify important metrics. Implementing a metric hierarchy can help manage this complexity and improve user experience.

Related Concepts

A/B Testing Frameworks
Data Pipeline Orchestration
Metric Management Strategies