Turning Metadata Into Insights with Databook

Sunheng Taing, Atul Gupte

Uber

•

Sunheng Taing, Atul Gupte

•25 min read•advanced•

--

•View Original

CassandraGraphQLHTMLJSONMicroservicesMySQLReactSQLThrift

Overview

The article discusses Uber's Databook, an in-house platform designed to manage and surface metadata related to various data entities. It outlines the evolution of Databook, its architecture, and the principles guiding its redesign to enhance data discovery and insights for users across the organization.

What You'll Learn

1

How to leverage Databook for effective metadata management

2

Why a centralized metadata system is crucial for data discovery

3

How to implement a flexible data model for diverse data entities

Prerequisites & Requirements

Understanding of metadata management concepts
Familiarity with APIs and data ingestion processes(optional)

Key Questions Answered

What is the purpose of Databook at Uber?

Databook serves as a centralized platform for managing and surfacing metadata related to various data entities within Uber. It enables users to discover, understand, and manage data assets effectively, thereby enhancing decision-making processes across the organization.

How does Databook improve data discovery for users?

Databook enhances data discovery by providing a user-friendly interface and robust search capabilities that allow users to easily find and access datasets, business metrics, dashboards, and more. The system aggregates metadata and offers insights into data quality and relationships, streamlining the data exploration process.

What are the key components of Databook's architecture?

Databook's architecture includes several components such as metadata sources, a metadata ingestion API, a persistent storage system, and a metadata event log. These components work together to ensure efficient data management and retrieval, supporting various use cases across Uber.

What challenges did Uber face with the original Databook?

The original Databook struggled to scale with the increasing complexity and volume of data at Uber. It was limited in its ability to manage diverse data entities and lacked the flexibility needed to support evolving user requirements, prompting a complete redesign.

Key Statistics & Figures

Number of cities Uber operates in

over 10,000

This statistic highlights the scale at which Uber operates and the vast amount of data generated daily.

Number of countries Uber services

69

This figure underscores the global reach of Uber and the complexity of its data landscape.

Time to onboard new data entities

less than one hour

This improvement reflects the efficiency gained from the new Databook architecture compared to the previous weeks-long process.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database

Mysql

Used as the persistent storage for Databook, supporting flexible and scalable metadata management.

API

Graphql

Facilitates effective data retrieval through the Databook UI and other tools.

Messaging

Kafka

Handles metadata ingestion and event logging for real-time updates and auditing.

Key Actionable Insights

1
Implement a centralized metadata management system to streamline data discovery processes.
Centralizing metadata helps eliminate duplication and inconsistencies, making it easier for users to find and utilize data assets effectively.

2
Utilize a flexible data model to accommodate various data entity types.
A flexible model allows for better representation of different data entities, ensuring that metadata is accurately captured and easily accessible.

3
Incorporate user feedback into the design of data discovery tools.
Engaging with users during the development process ensures that the tools meet their needs and improve overall user experience.

Common Pitfalls

1

Failing to establish a centralized metadata management system can lead to data duplication and inconsistencies.

Without a unified approach, different teams may create separate metadata stores, resulting in confusion and inefficiencies in data discovery.

Related Concepts

Metadata Management Best Practices

Data Quality Monitoring Techniques

Graph Data Structures For Representing Relationships