Rebuilding Uber’s Apache Pinot™ Query Architecture

Ankit Sultana, Christina Li, Shaurya Chaturvedi, Tarun Mavani, Shreyaa Sharma
11 min readadvanced
--
View Original

Overview

This article discusses the rebuilding of Uber's Apache Pinot™ query architecture, focusing on the transition from Neutrino to a new query system that utilizes Pinot's Multi-Stage Engine Lite Mode. The new architecture aims to enhance performance, simplify query execution, and improve reliability across various use cases.

What You'll Learn

1

How to transition from Neutrino to Pinot's Multi-Stage Engine Lite Mode

2

Why query pushdown is essential for optimizing performance in OLAP systems

3

When to use Cellar for direct connections to Pinot brokers

Prerequisites & Requirements

  • Understanding of OLAP systems and query architectures
  • Familiarity with Apache Pinot and its query languages(optional)

Key Questions Answered

What challenges did Uber face with the Neutrino query architecture?
Uber faced significant challenges with the Neutrino query architecture, including complicated semantics that made it hard to understand the query execution process. This layered architecture could lead to unpredictable behavior, where minor changes in queries resulted in vastly different execution plans.
How does the new Multi-Stage Engine Lite Mode improve query execution?
The Multi-Stage Engine Lite Mode simplifies query execution by adding a configurable max record limit for the leaf stage and executing queries using a scatter-gather paradigm. This approach enhances performance while maintaining the reliability of query execution.
What is the role of Cellar in Uber's new query architecture?
Cellar acts as a lightweight passthrough proxy that allows users to query data using PinotSQL or M3QL without modifying the original query. This design ensures that users only deal with the semantics defined by Pinot, providing a more straightforward querying experience.

Key Statistics & Figures

Total QPS served by Cellar
nearly 20%
This percentage reflects Cellar's share of the total QPS served by Neutrino at the time of writing.

Technologies & Tools

Database
Apache Pinot
Used as the underlying analytics platform for real-time data processing.
Query Engine
Neutrino
An internal fork of Presto optimized for low latency and high QPS.
Query Language
M3ql
Used for querying data through the Cellar interface.

Key Actionable Insights

1
Transitioning to the Multi-Stage Engine Lite Mode can significantly enhance query performance and reliability.
This mode is particularly beneficial for complex queries that require high throughput and low latency, making it essential for real-time analytics applications.
2
Utilizing query pushdown effectively can optimize resource usage and improve response times in OLAP systems.
By pushing down sub-plans to Pinot, you can reduce the amount of data processed at higher levels, leading to faster query execution and lower resource consumption.

Common Pitfalls

1
One common pitfall is underestimating the complexity of query execution in layered architectures.
This can lead to unpredictable behavior where minor query changes result in significant performance variations. It's crucial to thoroughly test queries to understand their execution plans.

Related Concepts

Olap Systems
Query Optimization Techniques
Apache Pinot Architecture