Part-I: Introducing Minerva — Airbnb’s Metric Platform
Overview
This article discusses how Airbnb developed Minerva, a metric platform that ensures metric consistency across the company, enhancing data quality and analytics capabilities. It outlines the challenges faced during Airbnb's analytics evolution and how Minerva addresses these issues by providing a single source of truth for metrics.
What You'll Learn
1
How to implement a centralized metric platform for analytics
2
Why consistent metrics are crucial for data-driven decision making
3
How to leverage data denormalization for better performance in analytics
Prerequisites & Requirements
- Understanding of data analytics and metrics
- Familiarity with data warehousing concepts and tools like Apache Airflow(optional)
Key Questions Answered
How did Minerva improve data quality at Airbnb?
Minerva improved data quality by providing a centralized platform for defining and managing metrics, ensuring consistency across various teams. It allows for programmatic joining of data, backfilling when business logic changes, and presenting data uniformly across different tools, which enhances trust in analytics.
What were the growing pains faced by Airbnb in their analytics journey?
Airbnb faced significant challenges such as data proliferation leading to confusion over metrics, discrepancies in reporting between teams, and a decline in trust in data quality. These issues arose from the rapid growth of data and the lack of a unified system for managing metrics.
What technologies are used in Minerva's infrastructure?
Minerva is built on open-source technologies including Apache Airflow for workflow orchestration, Apache Hive and Apache Spark as compute engines, and Presto and Apache Druid for data consumption. This stack supports the full lifecycle of metrics from creation to deprecation.
How did Minerva help Airbnb during the COVID-19 crisis?
During the COVID-19 crisis, Minerva enabled Airbnb to quickly analyze the impact on bookings and cancellations. It facilitated the creation of an executive dashboard that became the authoritative source of truth, allowing the company to respond effectively to changing market conditions.
Key Statistics & Figures
Metrics in Minerva
12,000
Minerva currently holds over 12,000 metrics, showcasing its extensive use across various teams at Airbnb.
Dimensions in Minerva
4,000
Minerva includes more than 4,000 dimensions, indicating the depth of data available for analysis.
Data producers
200
There are over 200 data producers utilizing Minerva across different functions and teams.
COVID-19 dashboard views
11,000
The COVID-19 dashboard created using Minerva metrics received over 11,000 views, highlighting its importance during the crisis.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Workflow Orchestration
Apache Airflow
Used for managing workflows in Minerva.
Compute Engine
Apache Hive
Serves as a data warehouse for querying and managing data.
Compute Engine
Apache Spark
Used for large-scale data processing and analytics.
Data Consumption
Presto
Facilitates fast querying of large datasets.
Data Consumption
Apache Druid
Provides real-time analytics and data exploration capabilities.
Key Actionable Insights
1Implement a centralized metric platform to unify data sources and improve analytics consistency.By centralizing metrics, organizations can reduce discrepancies in reporting and enhance trust in data, which is crucial for informed decision-making.
2Utilize data denormalization to optimize performance in analytics queries.Denormalization can significantly speed up data retrieval times, which is essential for real-time analytics and reporting in fast-paced environments.
3Establish a robust data governance framework to manage metrics and ensure quality.A strong governance framework helps maintain data integrity and lineage, which is vital for organizations relying on accurate data for strategic decisions.
Common Pitfalls
1
Failing to establish a single source of truth can lead to inconsistent metrics across teams.
Without a centralized metric platform, different teams may use varying definitions and calculations, resulting in confusion and mistrust in data.
2
Neglecting data governance can result in poor data quality and lineage issues.
Inadequate governance leads to difficulties in tracking data lineage, which can cause significant problems when data issues arise.
Related Concepts
Data Governance
Metric Management
Data Warehousing
Analytics Platforms