Improving Efficiency Of Goku Time Series Database at Pinterest (Part 2)

Pinterest Engineering
13 min readintermediate
--
View Original

Overview

This article discusses the improvements made to the Goku time series database at Pinterest, focusing on enhancing query efficiency through features like rollup, pre-aggregation, and pagination. It highlights the architecture of Goku and the strategies implemented to optimize query performance and reduce latency.

What You'll Learn

1

How to improve query performance in time series databases using rollup techniques

2

Why pre-aggregation can reduce query latency and improve efficiency

3

How to implement pagination for managing expensive queries in Goku

Prerequisites & Requirements

  • Understanding of time series data models and query optimization techniques
  • Familiarity with Goku and its architecture(optional)

Key Questions Answered

How does Goku improve query performance for time series data?
Goku enhances query performance through features like rollup, which reduces data granularity and improves latency, and pre-aggregation, which minimizes cardinality by removing unnecessary tags. These strategies lead to faster query responses and lower resource consumption.
What is the role of pagination in Goku's query handling?
Pagination allows Goku to manage expensive queries by breaking them into smaller parts, ensuring that users receive results without timeouts. This approach helps control server resource usage while still providing access to large datasets.
What are the benefits of using rollup in Goku?
Rollup in Goku reduces the storage cost of raw data and decreases CPU aggregation costs, leading to significantly lower query latencies. In production, queries using rolled-up data have shown latency improvements of nearly 1000 times compared to raw data queries.

Key Statistics & Figures

p99 latency for queries using rolled-up data
almost 1000x less than queries using raw data
This performance metric highlights the significant efficiency gains achieved through the rollup feature in Goku.
Cardinality of a metric name
32M during peak hours
This statistic illustrates the challenges faced with high cardinality metrics, which can lead to performance issues.

Technologies & Tools

Database
Goku
An in-house time series database used for storing and querying metrics data at Pinterest.
Communication Protocol
Apache Thrift
Used for Query RPC in Goku.

Key Actionable Insights

1
Implement rollup for older time series data to enhance query performance.
By aggregating older data into rolled-up formats, you can reduce the amount of data processed during queries, leading to faster response times and lower resource usage.
2
Utilize pre-aggregation to manage high cardinality metrics effectively.
This can help in reducing the load on the system and improve query response times, especially for metrics that have a large number of unique tag combinations.
3
Adopt pagination for handling expensive queries to avoid timeouts.
This method allows for controlled resource usage while still delivering results, making it a practical approach for queries that would otherwise overwhelm the system.

Common Pitfalls

1
Failing to implement pre-aggregation can lead to high cardinality issues and query timeouts.
Without pre-aggregation, queries may attempt to access too many time series, leading to resource exhaustion and degraded performance. It's essential to analyze query patterns and enable pre-aggregation where necessary.

Related Concepts

Time Series Data Models
Query Optimization Techniques
High Cardinality Metrics Management