How Airbnb safeguards changes in production

Part II: Near Real-time Experiments

Zack Loebel-Begelman
9 min readadvanced
--
View Original

Overview

The article discusses Airbnb's Safe Deploy system, focusing on its architecture and engineering choices for implementing near real-time experiments. It highlights the components of the system, including the Ramp Controller, Near Real Time (NRT) pipeline, and the Measured framework, emphasizing the importance of safeguarding changes in production.

What You'll Learn

1

How to design a near real-time experimentation system for production environments

2

Why limiting near real-time results to the first 24 hours is effective for catching major issues

3

How to utilize Apache Flink for processing event streams in real-time

4

When to implement automated experiment ramping to minimize negative impacts

Prerequisites & Requirements

  • Understanding of event-driven architectures and data processing pipelines
  • Familiarity with Apache Flink and Kafka(optional)

Key Questions Answered

What are the main components of Airbnb's Safe Deploy system?
The Safe Deploy system consists of three main components: the Ramp Controller, which coordinates experiment configurations; the Near Real Time (NRT) pipeline, which processes and enriches data; and the Measured framework, which computes metrics and statistical significance of changes.
How does the Ramp Controller minimize negative impacts during experiments?
The Ramp Controller automates the ramping of experiments, gradually increasing exposure while monitoring metrics. If any egregiously negative metric is detected, it immediately shuts down the experiment to prevent further negative impacts.
What challenges did Airbnb face when implementing the NRT pipeline?
Airbnb encountered challenges such as handling out-of-order events and managing data aging. They addressed these by implementing a custom join mechanism and buffering strategies to ensure timely and accurate data processing.
Why was the initial focus of Safe Deploys on A/B tests?
The initial focus on A/B tests was to build trust in the system and gain experience with automated anomaly detection and remediation, which would help in safeguarding changes in production more effectively.

Key Statistics & Figures

Percentage of experiment starts using Safe Deploys
85%
Since enabling Safe Deploys by default, it has been utilized for over 85% of experiment starts.
Threshold for marking a metric as egregious
-20% change with p-value ≤ 0.01
A metric is considered egregious if it shows a percent change smaller than -20% with an adjusted p-value of less than or equal to 0.01.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement a Ramp Controller to automate the ramping of experiments and monitor metrics effectively.
This approach minimizes human error and allows for quicker responses to negative impacts, enhancing the reliability of experiments.
2
Utilize Apache Flink for real-time data processing to improve the responsiveness of your experimentation system.
Flink's capabilities in handling event streams can significantly enhance the performance and scalability of your data processing pipelines.
3
Limit near real-time results to the first 24 hours of an experiment to focus on catching major issues.
This strategy allows teams to transition to batch results, which provide comprehensive insights without overwhelming the system with data.

Common Pitfalls

1
Relying solely on batch results for decision-making can lead to delayed responses to negative impacts.
This happens because batch results may not provide timely insights, making it crucial to implement near real-time monitoring for immediate feedback.
2
Underestimating the complexity of managing out-of-order events in streaming data.
This can lead to inaccurate data processing and results, so it's important to design robust mechanisms for handling event timing and ordering.

Related Concepts

Event-driven Architectures
Data Processing Pipelines
A/B Testing Methodologies
Anomaly Detection Systems