How Airbnb Enables Consistent Data Consumption at Scale

Part-III: Building a coherent consumption experience

Shao Xie
15 min readintermediate
--
View Original

Overview

This article discusses how Airbnb leverages the Minerva API to provide a consistent and simplified data consumption experience at scale. It addresses the challenges of data sourcing and computation while detailing the architecture and integration of various tools to enhance data accessibility for users with varying expertise.

What You'll Learn

1

How to utilize the Minerva API for consistent data consumption

2

Why a metric-centric approach enhances data accessibility

3

How to implement the Split-Apply-Combine paradigm in data analysis

Prerequisites & Requirements

  • Understanding of data metrics and dimensions
  • Familiarity with BI tools like Apache Superset and Tableau(optional)

Key Questions Answered

How does the Minerva API simplify data consumption at Airbnb?
The Minerva API acts as a metric-serving layer that abstracts the complexities of data sourcing and computation. It allows users to request metrics without needing to understand where the data is stored or how it is computed, thus simplifying the data consumption experience.
What challenges does Airbnb face in data integration?
Airbnb faces challenges in determining the correct data sources for metrics, ensuring accurate calculations across various metric types, and integrating data into downstream applications. These challenges necessitate a robust solution like the Minerva API to streamline data access and usage.
What is the role of the Metadata Fetcher in the Minerva API?
The Metadata Fetcher periodically retrieves and caches metadata about data sources to ensure that the Minerva API can serve the best available data for user queries. It checks for data completeness and updates the source of truth in a MySQL database every 15 minutes.
How does Metric Explorer cater to non-technical users?
Metric Explorer is designed for users with varying levels of data expertise, optimizing accessibility and consistency over flexibility. It provides a user-friendly interface that allows users to perform data operations without needing deep technical knowledge.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Minerva API
Serves as the interface for consistent data consumption across applications.
Frontend
Apache Superset
Used for data exploration and visualization.
Frontend
Tableau
Used for advanced data reporting and visualization.
Database
Mysql
Stores cached metadata for data sources.
Database
Druid
Used for querying and aggregating data.
Database
Presto
Used for querying large datasets across various sources.

Key Actionable Insights

1
Leverage the Minerva API to streamline data requests across various applications.
By using the Minerva API, teams can ensure consistent data consumption without needing to manage the complexities of data sourcing and computation, which can save time and reduce errors.
2
Utilize the Split-Apply-Combine paradigm for effective data analysis.
This approach allows for breaking down complex queries into manageable sub-queries, making it easier to handle large datasets and derive insights efficiently.
3
Implement a Metadata Fetcher to maintain up-to-date data source information.
Regularly updating metadata ensures that users are querying the most accurate and complete datasets, which is crucial for reliable business insights.

Common Pitfalls

1
Failing to ensure data completeness when selecting data sources.
This can lead to inaccurate insights and decision-making. It's essential to implement a robust metadata management strategy to verify that all necessary data columns and time ranges are covered.
2
Overcomplicating data queries without leveraging the Minerva API's capabilities.
Users may struggle with complex SQL queries instead of utilizing the Minerva API to simplify their data requests, which can lead to inefficiencies and errors.

Related Concepts

Data Metrics And Dimensions
Business Intelligence Tools
Data Consistency And Accuracy
Data-driven Decision Making