Engineering SQL Support on Apache Pinot at Uber

Haibo Wang
16 min readadvanced
--
View Original

Overview

The article discusses how Uber engineered SQL support on Apache Pinot, enhancing real-time analytics capabilities for their Big Data stack. It details the integration of Presto and Pinot, allowing users to perform complex SQL queries and improve operational efficiency.

What You'll Learn

1

How to integrate Presto with Apache Pinot for real-time analytics

2

Why SQL support is crucial for ad-hoc data queries in Big Data environments

3

How to implement aggregate pushdown to enhance query performance

Prerequisites & Requirements

  • Understanding of SQL and data analytics concepts
  • Familiarity with Presto and Apache Pinot(optional)

Key Questions Answered

How does Uber enhance SQL support on Apache Pinot?
Uber enhances SQL support on Apache Pinot by integrating it with Presto, allowing users to perform complex SQL queries and join data from different sources. This integration enables real-time analytics and empowers operations teams to build dashboards without extensive engineering support.
What are the benefits of using aggregate pushdown in queries?
Aggregate pushdown allows Presto to request aggregated values directly from Pinot, significantly reducing the amount of data transferred. This leads to improved query performance, with reductions in latency by more than 10x, as unnecessary data fetching is minimized.
What challenges did Uber face with their initial SQL support?
Uber faced challenges with timely ad-hoc analytics, as engineers had to manually build dashboards for complex queries that required data across multiple tables. The lack of support for nested queries and joins in their existing setup hindered operational efficiency.

Key Statistics & Figures

Query performance improvement
more than 10x
This improvement is due to the reduction of unnecessary data transfer between Presto workers and Pinot servers.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Apache Pinot
Used as a real-time OLAP datastore for analytics.
Query Engine
Presto
Facilitates SQL querying across different data sources, including Pinot.

Key Actionable Insights

1
Integrating Presto with Pinot can streamline analytics workflows, allowing teams to perform complex queries without heavy reliance on engineering resources.
This integration is particularly useful for operations teams needing quick access to insights for decision-making, enhancing overall efficiency.
2
Implementing aggregate pushdown can drastically reduce query latency, improving the responsiveness of analytics dashboards.
By minimizing data transfer, organizations can achieve faster insights, which is critical in environments requiring real-time data analysis.

Common Pitfalls

1
Relying solely on Presto for real-time analytics without integrating with a datastore like Pinot can lead to outdated data and slow query responses.
This occurs because Presto traditionally queries data from sources like Hadoop, which may not provide the freshness required for real-time analytics.

Related Concepts

Real-time Analytics
Big Data Architecture
SQL Query Optimization