Ocelot: Scaling observational causal inference at LinkedIn

LinkedIn Engineering Team
14 min readintermediate
--
View Original

Overview

The article discusses LinkedIn's Ocelot platform, which enables scalable observational causal inference to estimate the impact of product changes when A/B testing is not feasible. It highlights the importance of understanding causal relationships to improve user experiences and outlines the platform's features and methodologies.

What You'll Learn

1

How to utilize observational causal inference methods to estimate treatment effects

2

Why A/B testing may not always be feasible and how to apply alternative methods

3

How to set up and execute causal studies using the Ocelot platform

Prerequisites & Requirements

  • Understanding of causal inference concepts
  • Familiarity with data analysis tools and platforms(optional)

Key Questions Answered

What is observational causal inference and when is it used?
Observational causal inference is a method used to estimate treatment effects when randomization is not possible. It is particularly useful in scenarios such as evaluating marketing campaigns, understanding the impact of bugs, or analyzing economic shocks, where traditional A/B testing cannot be applied.
How does the Ocelot platform facilitate causal studies at LinkedIn?
The Ocelot platform provides a user-friendly web application that allows data scientists to run complex causal studies without coding. It includes features for guided study setup, UI validation, and automated robustness checks, significantly reducing the time and expertise required to conduct observational causal inference.
What methods are available on the Ocelot platform for causal inference?
Ocelot offers several methods for causal inference, including Coarsened Exact Matching (CEM), Doubly Robust (DR) estimator, Instrumental Variables (IV) estimation, Fixed Effects Models (FEM), and Bayesian Structured Time Series (BSTS). These methods cater to different data types and analysis needs.
What measures are taken to ensure the robustness of study results in Ocelot?
To ensure robustness, Ocelot implements automated checks and has a central review committee that vets study designs. The committee ensures that treatment effect estimates are only interpreted as causal if the studies meet rigorous standards, enhancing the reliability of the results.

Key Statistics & Figures

Number of observational causal studies produced annually
50+
Since the launch of the Ocelot platform in 2019, LinkedIn has increased its output of observational causal studies significantly.
Time reduction for conducting causal studies
from weeks to hours
The Ocelot platform allows domain expert data scientists to run causal studies independently, drastically reducing the time needed for analysis.
Weekly experiments added through T-REX platform
2,000
LinkedIn's T-REX experimentation platform supports extensive A/B testing, complementing the observational causal inference capabilities of Ocelot.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Workflow Management
Azkaban
Used for managing the data pipelines that prepare modeling data and execute causal modeling code within the Ocelot platform.
Data Processing
Spark
Utilized for processing large datasets efficiently within the Ocelot pipelines.
Programming Language
R
Used for executing causal modeling code in the Ocelot platform.

Key Actionable Insights

1
Leverage the Ocelot platform to streamline your causal analysis process.
Using Ocelot can significantly reduce the time required to conduct observational causal studies from weeks to just hours, allowing data scientists to focus on analysis rather than setup.
2
Incorporate robustness checks in your causal studies to validate results.
Robustness checks are essential for ensuring the reliability of causal estimates. Ocelot automates this process, making it easier to maintain high standards in study design.
3
Utilize the pre-defined covariate sets in Ocelot to enhance your studies.
Ocelot provides a library of over 200 commonly used covariates, simplifying the process of selecting relevant variables and improving the quality of your causal analyses.

Common Pitfalls

1
Failing to account for confounding variables can lead to misleading causal estimates.
Confounding occurs when the treatment and control groups differ systematically, which can skew results. It's crucial to identify and adjust for these variables to ensure valid conclusions.

Related Concepts

Causal Inference Methods
A/B Testing
Data Science Best Practices
Observational Studies