Overview
This article discusses the development of Pinterest's new wide column database, Rockstorewidecolumn, built on RocksDB. It covers the motivations behind the database's creation, its data model, APIs, and key features that support Pinterest's growing storage needs.
What You'll Learn
1
How to implement a wide column database using RocksDB
2
Why versioned values are important for data history tracking
3
How to manage TTL for data expiration in a NoSQL database
Prerequisites & Requirements
- Understanding of NoSQL databases and their data models
- Familiarity with RocksDB and its APIs(optional)
Key Questions Answered
What is a wide column database and how does it differ from a key-value store?
A wide column database is a two-dimensional key-value store that allows for flexible columnar structures, meaning columns can vary from row to row without a fixed schema. This contrasts with a simple key-value store, which functions more like a persistent hash map.
How does Rockstorewidecolumn handle versioning of values?
Rockstorewidecolumn supports versioning by storing multiple versions of data for the same key and column. Each update is modeled as a new entry with a unique timestamp, allowing for historical tracking of values.
What APIs does Rockstorewidecolumn provide for data manipulation?
The database provides APIs for getting, putting, and deleting rows. Each API allows for operations on datasets, including specifying row keys, column names, and the number of versions to handle, ensuring flexibility in data management.
What is the purpose of TTL in Rockstorewidecolumn?
TTL (Time to Live) is used to set expiration for data in a dataset. Once a value expires, it cannot be retrieved, helping to manage storage costs and maintain data relevance by automatically cleaning up older data.
Key Statistics & Figures
Monthly users served by Pinterest
480M
This statistic highlights the scale at which Rockstorewidecolumn operates, managing substantial data for a significant user base.
Production use cases onboarded
300
Over two years, Rockstorewidecolumn has successfully integrated with numerous use cases, demonstrating its versatility and reliability.
Requests handled per second
millions
This performance metric underscores the database's capability to support high-traffic applications with low latency.
Data stored
petabytes
The database manages vast amounts of data, reflecting its scalability and efficiency in handling large datasets.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1Implement versioned values in your database to maintain a history of changes.This is critical for applications that require auditing or tracking changes over time, as it allows you to revert to previous states or analyze historical data effectively.
2Utilize TTL settings to manage data lifecycle and reduce storage costs.By configuring TTL, you can ensure that outdated data is automatically purged, which helps in maintaining optimal database performance and managing storage resources efficiently.
3Leverage pagination in API responses to handle large datasets efficiently.When dealing with extensive data, pagination prevents overwhelming the database and ensures that your application remains responsive, especially when retrieving large numbers of columns.
Common Pitfalls
1
Failing to implement proper TTL settings can lead to excessive data accumulation.
Without TTL, outdated data may linger in the database, consuming storage resources and potentially degrading performance. It's crucial to configure TTL to maintain data relevance and optimize storage.
2
Not utilizing pagination for large datasets can overwhelm the database.
When retrieving large numbers of columns, failing to paginate can lead to performance bottlenecks. Implementing pagination helps manage load and ensures efficient data retrieval.
Related Concepts
Nosql Databases
Data Modeling In Databases
Distributed Systems
Database Versioning Techniques