Structured DataStore (SDS): Multi-model Data Management With a Unified Serving Stack

Pinterest Engineering

•

Pinterest Engineering

•18 min read•advanced•

--

•View Original

ApacheCachingRate LimitingSQLThriftYAML

Overview

The article discusses the Structured DataStore (SDS), a unified multi-model data management platform developed by Pinterest. It highlights the transition from multiple independent query serving stacks to a single service that supports various data models with high availability and low latency.

What You'll Learn

1

How to implement a unified query language for data access

2

Why separating virtual and physical tables enhances data management

3

How to leverage backends for modular query processing

Prerequisites & Requirements

Understanding of data management concepts and query languages
Familiarity with Apache Thrift(optional)

Key Questions Answered

What is the Structured DataStore (SDS) and its purpose?

The Structured DataStore (SDS) is a unified multi-model data management platform designed to streamline query serving across various data models at Pinterest. It replaces multiple independent services with a single service that offers high availability and low latency for data access.

How does SDS improve data lifecycle management?

SDS enhances data lifecycle management by integrating online and offline query serving, streaming capabilities, cost attribution, and metadata management into a single platform. This consolidation reduces maintenance overhead and allows for more efficient data handling.

What are the key components of the SDS architecture?

The SDS architecture consists of several components including Unified Query Language (UQL), Unified Result Format (URF), frontends for different data models, middleware for query execution, and a metadata service for managing table metadata. Each component plays a crucial role in ensuring efficient data access and management.

What challenges does SDS address in data management?

SDS addresses challenges such as maintenance complexity, tight coupling of services to specific datastore technologies, and the need for reimplementation of common functionalities across different services. By unifying these elements, SDS simplifies data access and enhances flexibility.

Key Statistics & Figures

p99 latency

1–2 digit ms

This applies to the online query serving capabilities of SDS, ensuring high performance for data access.

High availability

99.99+

This indicates the reliability of the SDS platform in serving data requests.

Technologies & Tools

Backend

Apache Thrift

Used for defining the Unified Query Language (UQL) and facilitating communication between components.

Key Actionable Insights

1
Implementing a unified query language can significantly reduce the complexity of query transformations and improve performance.
By using a structured language like UQL, developers can streamline the process of query handling, making it easier to manage and optimize queries across different data models.

2
Separating virtual tables from physical tables allows for seamless data migrations without impacting client operations.
This approach not only enhances flexibility in managing data but also provides opportunities for performance optimizations, as physical tables can be moved or modified without client awareness.

3
Utilizing backends for modular processing can encapsulate functionalities and simplify maintenance.
By defining clear interfaces for each backend, teams can work on specific functionalities without worrying about the overall system, leading to better collaboration and reduced cognitive overhead.

Common Pitfalls

1

Overlooking the complexity of managing connections to various datastores can lead to performance issues.

This can happen if connection management is not carefully optimized, potentially causing bottlenecks in data access and retrieval.

2

Failing to achieve workload isolation in a multi-tenant service can result in resource contention.

This is a common challenge in multi-tenant architectures, where improper isolation can lead to performance degradation across different workloads.

Related Concepts

Multi-model Data Management

Unified Query Languages

Data Lifecycle Management