Preon: Presto Query Analysis for Intelligent and Efficient Analytics

Gurmeet Singh
13 min readintermediate
--
View Original

Overview

The article discusses Preon, a microservice developed by Uber for intelligent and efficient query analysis using the Presto SQL engine. It highlights the architecture, use cases, and benefits of Preon in optimizing query performance and resource utilization within Uber's data infrastructure.

What You'll Learn

1

How to analyze SQL queries for performance optimization using Preon

2

Why query analysis is crucial for efficient data processing in large-scale systems

3

When to implement caching strategies to reduce redundant queries

Key Questions Answered

What is Preon and how does it enhance query analysis?
Preon is a microservice developed by Uber that analyzes SQL queries executed in Presto to provide actionable insights. It enhances query performance by enabling features like predicate analysis, query validation, and caching, ultimately leading to more efficient data processing.
How does Preon improve query execution efficiency at Uber?
Preon improves query execution efficiency by analyzing query patterns and providing insights that help in optimizing data layout and caching strategies. This results in significant reductions in data read operations, as seen in the 5-7% reduction in query traffic due to deduplication.
What challenges does Preon face in development and deployment?
Preon faces challenges such as maintaining synchronization with the Uber Presto repository and ensuring that updates to Presto modules do not disrupt analysis functionalities. Regular upgrades and careful management of dependencies are crucial to avoid failures in query validation.

Key Statistics & Figures

Daily queries executed at Uber
500,000
This is the total number of queries processed daily using the Presto engine.
Reduction in data read from optimized tables
From 4PB to 2PB per week
This statistic reflects the impact of data layout formatting recommendations made by Preon.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implementing query validation early in the submission process can significantly reduce the time wasted on executing erroneous queries.
By catching syntax errors before execution, Preon ensures that only valid queries are processed, which is crucial in a high-throughput environment like Uber.
2
Utilizing predicate analysis can help in determining the optimal data layout for tables, leading to reduced data read times.
Preon analyzes common query patterns and suggests sorting or partitioning strategies that can enhance performance through predicate pushdown.
3
Leveraging query result caching can lead to substantial reductions in redundant query executions.
With around 500,000 queries run daily at Uber, deduplicating 5-7% of these queries translates to significant resource savings and improved efficiency.

Common Pitfalls

1
Failing to keep Preon in sync with the Uber Presto repository can lead to analysis failures.
This happens because new SQL functions or UDFs added to Presto may not be recognized by Preon unless the modules are updated, emphasizing the need for regular maintenance.

Related Concepts

SQL Query Optimization
Data Processing Strategies
Microservices Architecture