Real-Time Analytics for Mobile App Crashes using Apache Pinot

Kriti Dangi, Anil Purohit, Parijat Bansal, Rohit Yadav

Uber

•

Kriti Dangi, Anil Purohit, Parijat Bansal, Rohit Yadav

•17 min read•intermediate•

--

•View Original

ApacheApache KafkaApache SparkAWSAzureElasticsearchGoogle CloudGoogle Cloud StorageJSON

Overview

The article discusses how Uber utilizes Apache Pinot for real-time analytics of mobile app crashes, enhancing their ability to detect and resolve issues quickly. It highlights the architecture, implementation strategies, and performance improvements achieved through this system.

What You'll Learn

1

How to implement real-time crash analytics using Apache Pinot

2

Why data retention policies are crucial for analytics performance

3

When to use hybrid table setups for data storage

4

How to optimize query patterns for better performance

Prerequisites & Requirements

Understanding of real-time data processing concepts
Familiarity with Apache Pinot and Kafka(optional)

Key Questions Answered

How does Uber use Apache Pinot for crash analytics?

Uber employs Apache Pinot to process and analyze crash data from mobile applications in real time. This allows them to quickly identify and resolve issues, enhancing user experience and maintaining trust. The system classifies crashes, aggregates data, and provides insights to developers and release managers.

What are the data retention policies for crash analytics at Uber?

Uber retains crash data for 45 days to analyze historical trends, with most use cases accessing data from the last 30 days. This retention policy helps in understanding patterns and improving the overall reliability of the application.

What are the performance improvements after migrating from Elasticsearch to Pinot?

The migration from Elasticsearch to Pinot resulted in significantly improved query performance, especially over extended time periods. Pinot demonstrated lower performance degradation compared to Elasticsearch, making it a more efficient choice for real-time analytics.

What challenges does Uber face with Pinot for crash analytics?

Uber encounters challenges such as the inability to perform complex aggregations on multiple dimensions and the fixed number of segments for offline jobs, which can lead to reliability issues. They have implemented workarounds, such as firing multiple queries in parallel to mitigate these limitations.

Key Statistics & Figures

Changes rolled out weekly at Uber

11,000

This high frequency of changes necessitates a robust system for real-time crash analytics.

Average daily data size for crash logs

36 TB

This significant volume of data underscores the importance of efficient data processing and retention strategies.

Peak crash classification rate

1,500 crashes per second

This rate highlights the need for a scalable analytics solution to handle real-time data influx.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database

Apache Pinot

Used for real-time analytics and crash data processing.

Streaming

Apache Kafka

Facilitates data ingestion from various sources into Pinot.

Stream Processing

Apache Flink

Handles data processing and aggregation tasks.

Key Actionable Insights

1
Implement a hybrid table setup to balance real-time and offline data processing needs.
This approach allows for efficient data ingestion and querying, ensuring that even if offline jobs fail, real-time data remains accessible.

2
Utilize data compression techniques to manage large payloads effectively.
By compressing crash event data, Uber can reduce storage costs and improve query performance, which is crucial for maintaining system efficiency.

3
Regularly review and adjust data retention policies to optimize performance.
Maintaining a 45-day retention policy helps in analyzing trends without overwhelming the system, ensuring that only relevant data is kept for analysis.

Common Pitfalls

1

Relying solely on a single query for complex data retrieval can lead to performance bottlenecks.

To avoid this, consider firing multiple parallel queries to distribute the load and enhance response times.

Related Concepts

Real-time Data Processing

Crash Analytics

Data Retention Policies

Performance Optimization Techniques