How Airbnb Achieved Metric Consistency at Scale

Part-I: Introducing Minerva — Airbnb’s Metric Platform

Robert Chang
12 min readintermediate
--
View Original

Overview

This article discusses how Airbnb developed Minerva, a metric platform that ensures metric consistency across the company, enhancing data quality and analytics capabilities. It outlines the challenges faced during Airbnb's analytics evolution and how Minerva addresses these issues by providing a single source of truth for metrics.

What You'll Learn

1

How to implement a centralized metric platform for analytics

2

Why consistent metrics are crucial for data-driven decision making

3

How to leverage data denormalization for better performance in analytics

Prerequisites & Requirements

  • Understanding of data analytics and metrics
  • Familiarity with data warehousing concepts and tools like Apache Airflow(optional)

Key Questions Answered

How did Minerva improve data quality at Airbnb?
Minerva improved data quality by providing a centralized platform for defining and managing metrics, ensuring consistency across various teams. It allows for programmatic joining of data, backfilling when business logic changes, and presenting data uniformly across different tools, which enhances trust in analytics.
What were the growing pains faced by Airbnb in their analytics journey?
Airbnb faced significant challenges such as data proliferation leading to confusion over metrics, discrepancies in reporting between teams, and a decline in trust in data quality. These issues arose from the rapid growth of data and the lack of a unified system for managing metrics.
What technologies are used in Minerva's infrastructure?
Minerva is built on open-source technologies including Apache Airflow for workflow orchestration, Apache Hive and Apache Spark as compute engines, and Presto and Apache Druid for data consumption. This stack supports the full lifecycle of metrics from creation to deprecation.
How did Minerva help Airbnb during the COVID-19 crisis?
During the COVID-19 crisis, Minerva enabled Airbnb to quickly analyze the impact on bookings and cancellations. It facilitated the creation of an executive dashboard that became the authoritative source of truth, allowing the company to respond effectively to changing market conditions.

Key Statistics & Figures

Metrics in Minerva
12,000
Minerva currently holds over 12,000 metrics, showcasing its extensive use across various teams at Airbnb.
Dimensions in Minerva
4,000
Minerva includes more than 4,000 dimensions, indicating the depth of data available for analysis.
Data producers
200
There are over 200 data producers utilizing Minerva across different functions and teams.
COVID-19 dashboard views
11,000
The COVID-19 dashboard created using Minerva metrics received over 11,000 views, highlighting its importance during the crisis.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement a centralized metric platform to unify data sources and improve analytics consistency.
By centralizing metrics, organizations can reduce discrepancies in reporting and enhance trust in data, which is crucial for informed decision-making.
2
Utilize data denormalization to optimize performance in analytics queries.
Denormalization can significantly speed up data retrieval times, which is essential for real-time analytics and reporting in fast-paced environments.
3
Establish a robust data governance framework to manage metrics and ensure quality.
A strong governance framework helps maintain data integrity and lineage, which is vital for organizations relying on accurate data for strategic decisions.

Common Pitfalls

1
Failing to establish a single source of truth can lead to inconsistent metrics across teams.
Without a centralized metric platform, different teams may use varying definitions and calculations, resulting in confusion and mistrust in data.
2
Neglecting data governance can result in poor data quality and lineage issues.
Inadequate governance leads to difficulties in tracking data lineage, which can cause significant problems when data issues arise.

Related Concepts

Data Governance
Metric Management
Data Warehousing
Analytics Platforms