Presto® Express: Speeding up Query Processing with Minimal Resources

Mingjia Hang, Gurmeet Singh

Uber

•

Mingjia Hang, Gurmeet Singh

•10 min read•advanced•

--

•View Original

ApacheApache KafkaJavaMySQLOracleSQL

Overview

The article discusses Uber's implementation of Presto Express, an enhancement to the Presto SQL query engine aimed at reducing the end-to-end Service Level Agreement (SLA) for short-running queries. It details the challenges faced with query slowness, the design and implementation of express queries, and the significant performance improvements achieved.

What You'll Learn

1

How to identify express queries using historical data

2

Why separating express clusters can improve query performance

3

How to implement a new queue system for express queries

Prerequisites & Requirements

Understanding of SQL query execution and performance metrics
Familiarity with distributed systems and data analytics(optional)

Key Questions Answered

What is Presto Express and how does it improve query performance?

Presto Express is an enhancement to the Presto SQL query engine designed to reduce the end-to-end SLA for short-running queries. By implementing a separate queue for express queries and optimizing resource allocation, it significantly improves performance, allowing more queries to be processed quickly.

How does Uber identify express queries?

Uber identifies express queries by analyzing historical execution times using exact and abstract fingerprints. If the P90 or P95 execution time of a query in the last few days is under 2 minutes, it is classified as an express query, allowing for optimized processing.

What challenges did Uber face with Presto before implementing express queries?

Uber faced significant query slowness due to throttling and concurrency limits in Presto clusters, which led to the need for additional capacity. This throttling resulted in long queuing times for users, particularly for non-express queries.

What impact did the express feature have on query performance metrics?

The express feature allowed low-tier express clusters to use only 10% of total resources while handling about 75% of the batch low-tier queries. It also reduced the P90 queuing time for express queries to about 10 seconds compared to hours for non-express queries.

Key Statistics & Figures

Weekly active users

12,000

These users run approximately 500,000 queries daily, reading around 100 PB from HDFS.

P90 queuing time for express queries

10 seconds

This is significantly lower compared to hours for non-express queries.

Percentage of batch low-tier queries handled by express clusters

75%

Express clusters utilize only 10% of the total batch low-tier Presto resources.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Presto

Used as the SQL query engine for interactive analytic queries.

Data Source

Apache Hive

One of the data sources queried through Presto.

Data Source

Apache Pinot

Used for real-time querying of data.

Data Source

Mysql

Another data source queried through Presto.

Data Source

Apache Kafka

Used for streaming data into Presto.

Key Actionable Insights

1
Implement a separate queue for express queries to enhance performance.
By routing express queries to a dedicated queue, organizations can significantly reduce queuing times and improve overall SLA for users, as seen in Uber's implementation.

2
Utilize historical data to predict query performance and classify express queries.
Leveraging historical execution times can help in accurately identifying express queries, thus optimizing resource allocation and reducing latency.

3
Consider separating express clusters from existing batch clusters.
This separation can lead to better resource utilization and performance, as express clusters can handle more queries without being constrained by the limits of batch processing.

Common Pitfalls

1

Overlooking query throttling in the execution process.

This can lead to express queries being idle despite being designed for quick execution, resulting in increased load on other clusters.

2

Failing to match resource allocation with query requirements.

Express queries require adequate resources to ensure they do not get blocked by non-express queries, which can lead to performance bottlenecks.

Related Concepts

Distributed SQL Query Engines

Performance Optimization Techniques

Data Analytics Best Practices