Goku: Building a scalable and high performant time series database system

Pinterest Engineering
7 min readintermediate
--
View Original

Overview

The article discusses Goku, an in-house time series database developed by Pinterest to address the limitations of OpenTSDB as the company scales. It highlights Goku's architecture, performance improvements, and its ability to efficiently handle high volumes of metrics data.

What You'll Learn

1

How to implement a time series database using Goku

2

Why Goku's inverted index improves query performance

3

How to utilize Gorilla compression for data storage

Prerequisites & Requirements

  • Understanding of time series data concepts
  • Familiarity with C++ programming

Key Questions Answered

What are the main challenges Goku addresses compared to OpenTSDB?
Goku addresses several limitations of OpenTSDB, including unnecessary scans by implementing an inverted index engine, reducing data size through Gorilla compression, and improving aggregation by moving computation closer to the storage layer. These changes significantly enhance performance and reduce operational overhead.
How does Goku's sharding strategy work?
Goku employs a two-layer sharding strategy where the first layer hashes the metric name to determine the shard group, and the second layer hashes the metric name along with tag key-value pairs to identify the specific shard. This ensures balanced data distribution and low network overhead.
What performance improvements does Goku offer over OpenTSDB?
Goku shows significant performance improvements over OpenTSDB, particularly in handling high cardinality queries and reducing latency. The article provides graphs illustrating latency reductions, demonstrating Goku's efficiency in processing time series data.
What is the purpose of the inverted index in Goku?
The inverted index in Goku allows for efficient querying by mapping search terms to a bitset, enabling quick execution of AND, OR, NOT, WILDCARD, and REGEX operations. This reduces unnecessary lookups compared to OpenTSDB's scan-based querying, enhancing overall performance.

Key Statistics & Figures

Data compression ratio
12x
Goku uses Gorilla compression to reduce the size of data points from 20 bytes in OpenTSDB.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Goku
An in-house time series database developed by Pinterest.
Programming Language
C++
Used to write Goku and implement its APIs.
Storage Engine
Gorilla
Utilized for in-memory storage of recent time series data.

Key Actionable Insights

1
Implementing Goku can significantly reduce the operational overhead associated with traditional time series databases like OpenTSDB.
As Pinterest scales, Goku's architecture allows for efficient data ingestion and querying, making it a viable solution for high-volume metrics data.
2
Utilizing Gorilla compression can drastically reduce data storage requirements, achieving up to 12x compression.
This is particularly beneficial for organizations dealing with large amounts of time series data, as it can lead to cost savings in storage infrastructure.
3
Adopting a two-layer sharding strategy can help maintain balanced data distribution across shards.
This approach minimizes network overhead and tail latency, which is crucial for applications requiring real-time data access.

Common Pitfalls

1
Failing to ensure data consistency across multiple Goku clusters can lead to discrepancies in metrics.
This issue arises when writes succeed in one cluster but fail in another, resulting in inconsistent data. Implementing log-based intra-cluster replication can help mitigate this risk.

Related Concepts

Time Series Data Modeling
Data Compression Techniques
Sharding Strategies
Real-time Data Processing