Overview
The article discusses cost reduction strategies implemented for Goku, a scalable time series database system developed by Pinterest. It highlights the analysis of query patterns, tiered data storage, and various solutions that led to significant cost savings and improved efficiency.
What You'll Learn
1
How to analyze query patterns to optimize database performance
2
Why tiered storage can reduce costs in time series databases
3
How to implement RocksDB tuning for better data compression
Prerequisites & Requirements
- Understanding of time series databases and data storage concepts
- Familiarity with AWS EC2 instances and RocksDB(optional)
Key Questions Answered
What strategies were implemented to reduce costs in Goku?
The article outlines several strategies including adjusting the rollup interval for tier 5 data, moving tier 5 data to a separate HDD-based cluster, and tuning RocksDB settings. These changes collectively resulted in a 30-35% reduction in costs by replacing expensive EC2 instances with more cost-effective options.
How does the tiered approach to data storage work in Goku?
Goku uses a tiered approach to segregate long-term data into different buckets, each with specific retention policies and rollup intervals. This method allows for efficient storage management and cost reduction by optimizing how data is queried and stored over time.
What were the findings from the query analysis on GokuL?
The query analysis revealed that only about 6,000 metrics out of 10 billion were queried for data older than three months, indicating that many older data points were rarely accessed. Additionally, over half of the queries specified rollup intervals of one day or more, suggesting a need for optimized data storage strategies.
What was the impact of RocksDB tuning on GokuL's performance?
Tuning RocksDB by implementing a stronger compression algorithm (ZSTD with level 5) and enabling partitioned index filtering led to a 40% reduction in disk usage. This optimization improved overall efficiency without significantly increasing latency, allowing for better resource management in GokuL.
Key Statistics & Figures
Cost reduction achieved
30-35%
This reduction was realized by replacing 325 i3.4xlarge instances with 111 d2.2xlarge instances.
Number of metrics queried for data older than three months
Approximately ~6K
Out of a total of 10 billion metrics, indicating low access frequency for older data.
Disk usage reduction from RocksDB tuning
40%
This was achieved by using a stronger compression algorithm and optimizing index filtering.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Database
Rocksdb
Used for storing long-term data in GokuL.
Cloud Infrastructure
AWS EC2
Utilized for hosting GokuL instances.
Key Actionable Insights
1Implement a tiered storage strategy to optimize data management and reduce costs.By categorizing data into different tiers based on access frequency and retention needs, organizations can significantly lower storage costs and improve query performance.
2Regularly analyze query patterns to identify underutilized data.Understanding which metrics are frequently queried can help in making informed decisions about data retention policies and storage solutions, ultimately leading to cost savings.
3Experiment with database tuning options to enhance performance.Adjusting settings such as compression algorithms and caching strategies can lead to substantial improvements in resource efficiency and cost reduction.
Common Pitfalls
1
Neglecting to analyze query patterns can lead to inefficient data storage.
Without understanding which data is frequently accessed, organizations may end up over-provisioning resources for rarely used data, resulting in unnecessary costs.
2
Failing to optimize database configurations can hinder performance.
Using default settings without tuning for specific workloads can lead to suboptimal performance and increased operational costs.
Related Concepts
Time Series Databases
Data Storage Optimization
Database Performance Tuning