Stateful workload operator: stateful systems on Kubernetes at LinkedIn

Michael Youssef
14 min readintermediate
--
View Original

Overview

The article discusses LinkedIn's implementation of a Stateful Workload Operator for managing stateful systems on Kubernetes. It highlights the challenges faced with traditional StatefulSet management and introduces the Application Cluster Manager (ACM) as a solution for lifecycle management and operational efficiency.

What You'll Learn

1

How to manage stateful applications on Kubernetes using a custom operator

2

Why traditional StatefulSet management may not meet complex application needs

3

How to implement an Application Cluster Manager for lifecycle management

Prerequisites & Requirements

  • Understanding of Kubernetes and stateful applications
  • Familiarity with Kubernetes Operators and Custom Resource Definitions (CRDs)(optional)

Key Questions Answered

What are the limitations of Kubernetes StatefulSet for managing stateful applications?
Kubernetes StatefulSet has limitations such as lack of sharding awareness, inability to manage planned or unplanned host maintenance, and restrictions on running multiple canary versions within the same set of pods. These constraints necessitate additional layers of management that complicate the deployment and maintenance of stateful applications.
How does the Application Cluster Manager (ACM) enhance stateful application management?
The Application Cluster Manager (ACM) coordinates with the Stateful Workload Operator to evaluate the safety of deployment and maintenance operations. It ensures that applications remain healthy during changes by managing shard replication and maintaining order across maintenance zones, thus simplifying lifecycle management.
What is the architecture of the Stateful Workload Operator?
The Stateful Workload Operator is built around five core Custom Resource Definitions (CRDs): LiStatefulSet, Revision, PodIndex, Operation, and StatefulPod. Each CRD serves a specific purpose in managing the lifecycle of stateful applications, allowing for clear separation of concerns and efficient operation.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Orchestration
Kubernetes
Used for managing stateful applications and implementing the Stateful Workload Operator.
Kubernetes Extension
Custom Resource Definitions (crds)
Used to define new resource types for managing the lifecycle of stateful applications.

Key Actionable Insights

1
Implement a centralized management system for stateful applications to reduce operational overhead.
By using the Stateful Workload Operator and ACM, teams can focus on application-specific logic rather than infrastructure management, leading to improved efficiency and reduced complexity.
2
Adopt cooperative scheduling to streamline deployment and maintenance processes.
This approach allows for better coordination of resources and reduces conflicts during operations, ultimately leading to a more stable environment for stateful applications.

Common Pitfalls

1
Relying solely on StatefulSet for managing complex stateful applications can lead to operational challenges.
This occurs because StatefulSet lacks features like sharding awareness and maintenance management, necessitating additional layers of complexity that can overwhelm teams.

Related Concepts

Kubernetes Operators
Stateful Applications
Custom Resource Definitions (crds)