Serving Millions of Apache Pinot™ Queries with Neutrino

Ankit Sultana, Pratik Tibrewal, Christina Li, Shreyaa Sharma, Ujwala Tulshigiri
12 min readadvanced
--
View Original

Overview

The article discusses how Uber leverages Neutrino, an internal fork of Presto, to efficiently serve millions of queries to Apache Pinot, a real-time OLAP database. It highlights the architectural design, performance optimizations, and challenges faced in implementing this system to handle high query rates with low latency.

What You'll Learn

1

How to optimize query execution in Apache Pinot using Neutrino

2

Why using a single-threaded execution model can improve performance

3

When to implement rate limiting to manage query spikes

Prerequisites & Requirements

  • Understanding of OLAP databases and query execution
  • Familiarity with Apache Pinot and Presto(optional)

Key Questions Answered

How does Neutrino improve query performance for Apache Pinot?
Neutrino enhances query performance by optimizing the execution engine, allowing it to handle over 10,000 queries per second with sub-second latencies. By running the coordinator and worker in a single JVM, it eliminates the overhead of HTTP calls, resulting in faster query processing.
What challenges does Neutrino face in query execution?
Neutrino encounters challenges such as SQL-to-SQL translation complexity, lack of join support, and breaking tenant isolation. These issues arise from its design and the need to manage traffic from over 100 unique callers, which can lead to resource contention.
What is the role of rate limiting in Neutrino?
Rate limiting in Neutrino is implemented through an internal load balancing system called Muttley, which configures the queries per second (QPS) for each caller. This prevents any single caller from overwhelming the system, ensuring stable performance across all users.
How does Neutrino handle query fingerprints?
Neutrino computes query fingerprints to identify expensive queries, leveraging a deny list configured in Flipr to mitigate the impact of rare scenarios where one query may affect others. This helps maintain performance and stability across services.

Key Statistics & Figures

Queries per second handled by Neutrino
over 10,000 QPS
This performance metric highlights Neutrino's capability to efficiently manage high query loads.
Daily Apache Pinot queries served
more than half a billion
This statistic underscores the scale at which Neutrino operates within Uber's infrastructure.
Percentage of queries leveraging Neutrino's execution engine
more than a third
This indicates the significant reliance on Neutrino for query processing within Uber.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Apache Pinot
Used as a real-time OLAP database for handling queries.
Query Engine
Presto
Serves as the foundation for Neutrino, enabling SQL querying capabilities.

Key Actionable Insights

1
Implementing a single-threaded execution model can significantly enhance performance in high-load scenarios.
This approach minimizes context switching and overhead associated with multi-threaded environments, making it ideal for systems like Neutrino that require rapid query processing.
2
Utilizing rate limiting can protect backend services from spikes in traffic.
By configuring QPS limits for different callers, you can ensure that no single service monopolizes resources, which is crucial in a multi-tenant environment like Uber's.
3
Incorporating query fingerprinting can help identify and manage resource-intensive queries effectively.
This strategy allows for better monitoring and optimization of query performance, ensuring that expensive queries do not degrade the overall system performance.

Common Pitfalls

1
Failing to implement proper rate limiting can lead to service degradation during peak usage.
Without rate limiting, a single caller can overwhelm the system, causing slowdowns or outages for all users. It's essential to configure limits based on expected usage patterns.
2
Not accounting for the limitations of SQL-to-SQL translation can lead to incorrect query results.
When queries are pushed down to Pinot, if the pushed-down plan lacks a limit, Neutrino may impose a default limit, altering the expected results. This can confuse users and lead to data inconsistencies.

Related Concepts

Apache Pinot
Presto
Olap Databases
Query Optimization Techniques