Operating Apache Pinot @ Uber Scale

Yupeng Fu, Girish Baliga, Ting Chen, Chinmay Soman

Uber

•

Yupeng Fu, Girish Baliga, Ting Chen, Chinmay Soman

•23 min read•advanced•

--

•View Original

ApacheApache KafkaJavaSQL

Overview

The article discusses Uber's experience operating Apache Pinot at scale, detailing its role in enabling real-time analytics across various use cases. It highlights the architecture, ingestion processes, and the lessons learned from scaling Pinot to handle terabyte-scale data with millisecond latencies.

What You'll Learn

1

How to implement real-time analytics using Apache Pinot

2

Why multi-tenancy is crucial for scaling data platforms

3

How to optimize data ingestion processes for low latency

Prerequisites & Requirements

Understanding of OLAP systems and real-time data processing
Familiarity with Apache Kafka and HDFS(optional)

Key Questions Answered

How does Uber scale Apache Pinot for real-time analytics?

Uber scales Apache Pinot by deploying a multi-cluster architecture that supports hundreds of use cases for querying terabyte-scale data with millisecond latencies. The platform evolved from a small ten-node cluster to hundreds of nodes, managing tens of terabytes of data and achieving thousands of queries per second in production.

What are the main use cases for Apache Pinot at Uber?

The main use cases for Apache Pinot at Uber include building customized dashboards for products like Uber Eats, executing analytical queries in backend services, and enabling near-real-time exploration of data for operational teams. These use cases help in monitoring demand-supply metrics and identifying anomalies.

What contributions has Uber made to Apache Pinot?

Uber has made several contributions to Apache Pinot, including enhancing self-service onboarding processes, integrating full SQL support through Presto, and improving the deep store functionality for segment management. These contributions aim to increase reliability and query flexibility.

What are common pitfalls when using Apache Pinot at scale?

Common pitfalls include memory overhead due to large scans and excessive segments per server, which can lead to garbage collection pauses and performance degradation. Uber addresses these issues by isolating ad hoc querying use cases to separate tenants and implementing message throttling.

Key Statistics & Figures

Total data footprint managed by Pinot

tens of terabytes

This reflects the scale at which Uber operates Pinot, having increased from tens of gigabytes in its early days.

Queries per second (QPS) in production

thousands of QPS

This demonstrates the performance capabilities of the Pinot platform as it scales to meet Uber's operational demands.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database

Apache Pinot

Used for low-latency analytical queries on large datasets.

Streaming

Apache Kafka

Serves as the data source for real-time ingestion into Pinot.

Storage

Hdfs

Used as a deep store for archiving Pinot segments.

Query Engine

Presto

Integrated with Pinot to enable full SQL support for querying data.

Key Actionable Insights

1
Implement multi-tenancy in your data architecture to isolate workloads and improve performance.
By logically grouping tables under tenant names, you can mitigate the impact of resource-heavy queries on other workloads, ensuring better service level agreements (SLAs) across different applications.

2
Utilize Apache Kafka for real-time data ingestion to enhance the freshness of analytics.
Real-time ingestion from Kafka allows for immediate data availability, which is crucial for applications that require up-to-date insights, such as monitoring user demand and operational metrics.

3
Regularly monitor and optimize segment management to prevent performance bottlenecks.
As data scales, managing the number of segments per server becomes critical. Implementing strategies like message throttling can help maintain performance and avoid crashes during state transitions.

Common Pitfalls

1

Overloading Pinot servers with large scans can lead to performance issues.

When users execute broad queries without time predicates, it can cause excessive memory usage and garbage collection pauses, impacting overall system performance.

2

Too many segments per server can overwhelm the Pinot cluster.

Having an excessive number of segments can lead to spikes in state transition messages, potentially crashing servers and controllers. Implementing message throttling can help mitigate this risk.

Related Concepts

Real-time Analytics

Olap Systems

Data Ingestion Strategies

Multi-tenancy In Databases