How Netflix microservices tackle dataset pub-sub

Netflix Technology Blog

Netflix

•

Netflix Technology Blog

•16 min read•intermediate•

--

•View Original

AWSCassandraElasticsearchgRPCJavaNode.jsREST API

Overview

The article discusses how Netflix utilizes a microservices architecture to manage dataset propagation through an in-house pub/sub system called Gutenberg. It highlights the challenges of disseminating datasets across multiple services and teams, and explains the design, architecture, and use cases of Gutenberg in ensuring efficient and reliable data versioning.

What You'll Learn

1

How to implement a dataset pub/sub system using versioned datasets

2

Why centralized dataset management can improve system reliability

3

When to use immutability in dataset versioning

4

How to handle data resiliency with version pinning

Prerequisites & Requirements

Understanding of microservices architecture and pub/sub systems
Familiarity with gRPC and REST APIs(optional)

Key Questions Answered

How does Gutenberg manage dataset versioning?

Gutenberg manages dataset versioning by allowing publishers to create topics where each publish generates a new immutable version of the dataset. Consumers subscribe to these topics and receive updates to the latest versions, while also being able to access older versions for debugging or re-training purposes.

What are the main use cases for Gutenberg at Netflix?

The main use cases for Gutenberg include propagating configuration data for services, managing A/B test configurations, and serving as a versioned data store for machine-learning applications. It allows teams to leverage a single source of truth while maintaining their own service autonomy.

What challenges does Gutenberg address in microservices architecture?

Gutenberg addresses the challenges of dataset propagation across multiple services by providing a centralized system for versioned datasets, which reduces the need for individual teams to create their own solutions and ensures consistency and reliability in data access.

How does Gutenberg ensure data resiliency?

Gutenberg ensures data resiliency through features like version pinning, which allows operators to quickly revert to a known good version of data during failures. This capability is crucial for maintaining system stability in the face of bad deployments.

Key Statistics & Figures

Number of topics in production

tens-of-thousands

Gutenberg has been in use at Netflix for three years, managing a large number of topics.

Average publishes per second

1-2

This reflects the typical data publishing activity within the Gutenberg system.

Number of nodes subscribed to at least one topic

low six figures

This indicates the extensive reach of the Gutenberg system across Netflix's microservices.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Gutenberg

A dataset pub/sub system used for managing versioned datasets.

Protocol

Grpc

Used for communication between the Gutenberg service and client libraries.

Database

Cassandra

Used for persistence and to propagate publish metadata across regions.

Storage

S3

Used for storing large dataset payloads.

Key Actionable Insights

1
Implement versioned dataset management to enhance data reliability across services.
Using a pub/sub system like Gutenberg allows teams to manage datasets centrally, ensuring that all services have access to the latest and most reliable data without duplicating efforts.

2
Utilize immutability in dataset versions to simplify debugging and rollback processes.
By making each version of a dataset immutable, teams can easily revert to previous versions without the risk of unintended side effects, thus improving overall system reliability.

3
Incorporate pinning mechanisms for critical datasets to mitigate risks during deployments.
Pinning allows teams to enforce the use of a stable dataset version during uncertain deployments, reducing the likelihood of introducing errors into production.

Common Pitfalls

1

Failing to implement a centralized dataset management system can lead to inconsistent data across services.

When each team builds their own solutions, it often results in varying degrees of success and can complicate debugging and maintenance.

2

Overlooking the importance of data immutability can complicate rollback processes.

Without immutability, reverting to a previous state may introduce errors or inconsistencies, making it difficult to ensure system stability.

Related Concepts

Microservices Architecture

Pub/Sub Systems

Data Versioning

Machine Learning Data Management