Supercharging A/B Testing at Uber

Sergey Gitlin, Krishna Puttaswamy, Luke Duncan, Deepak Bobbarjung, Arun Babu A S P
26 min readintermediate
--
View Original

Overview

The article discusses Uber's journey in rebuilding its A/B testing platform, Morpheus, to address scalability and reliability challenges. It highlights the importance of a robust experimentation system to empower decision-making and improve product development processes.

What You'll Learn

1

How to design a robust A/B testing platform that ensures statistical correctness and reliability

2

Why decoupling experimentation from code changes enhances user productivity

3

How to implement flexible experiment designs that accommodate diverse product needs

Prerequisites & Requirements

  • Understanding of A/B testing principles and statistical analysis
  • Experience with software development and experimentation frameworks(optional)

Key Questions Answered

What were the main challenges faced with the original A/B testing platform at Uber?
The original platform, Morpheus, faced challenges such as scalability issues, high rates of experiment reruns due to incorrect results, and a lack of support for diverse experiment designs. These issues hindered decision-making and slowed down the development process.
How does the new experimentation system improve user productivity?
The new system allows users to create, modify, and delete experiments without needing code changes, significantly reducing the time required for experimentation. This decoupling from the codebase enables faster iterations and enhances overall productivity.
What are the key components of an experiment in the new system?
An experiment in the new system consists of three key parts: randomization, treatment plan, and logs. Randomization determines how units are assigned to treatment groups, the treatment plan specifies actions based on context, and logs record additional information about the experiment.
What strategies were employed to ensure the reliability of the new experimentation system?
To enhance reliability, the new system includes safe default values for parameters, SDKs that cache previous payloads, and automated logging of experiment access. These measures help maintain functionality during network failures and improve the overall robustness of the system.

Key Statistics & Figures

Number of developers transitioned to the new system
2000+
This reflects the scale of the transition to the new experimentation platform at Uber.
Number of stale experiments deprecated
50,000+
This indicates the effort made to clean up the old experimentation system, Morpheus.

Technologies & Tools

Backend Configuration
Flipr
Used as the underlying system for managing parameters and experiments.

Key Actionable Insights

1
Implement a robust logging mechanism to track experiment access and outcomes, ensuring data integrity for analysis.
Accurate logging is crucial for understanding the impact of experiments and making informed decisions based on reliable data.
2
Decouple experimentation from code changes to streamline the process of running A/B tests.
This approach allows product teams to iterate quickly and adapt experiments without the lengthy build-release cycles associated with code changes.
3
Utilize a unified configuration system to manage experiments across different platforms seamlessly.
A centralized configuration reduces complexity and fosters collaboration between mobile and backend teams, enhancing the overall efficiency of the experimentation process.

Common Pitfalls

1
Failing to involve data science in the engineering process can lead to incorrect statistical outcomes.
Collaboration between engineering and data science is essential to ensure that system design decisions align with statistical requirements, preventing errors in experiment results.

Related Concepts

A/B Testing Methodologies
Statistical Analysis For Experimentation
Software Development Best Practices