Fleet Management at Spotify (Part 2): The Path to Declarative Infrastructure

Overview

This article discusses Spotify's transition to a declarative infrastructure model using Kubernetes, enabling efficient management of cloud resources across numerous services. It highlights the challenges faced during the migration from on-premise data centers to Google Cloud Platform (GCP) and the solutions implemented to streamline infrastructure management.

What You'll Learn

1

How to implement declarative infrastructure using Kubernetes

2

Why a GitOps workflow is essential for infrastructure management

3

How to leverage custom resources in Kubernetes for cloud resource management

4

When to use a break-glass mechanism for emergency changes in infrastructure

Prerequisites & Requirements

  • Understanding of cloud infrastructure concepts and Kubernetes
  • Familiarity with GitOps practices and CI/CD pipelines(optional)

Key Questions Answered

What challenges did Spotify face when transitioning to cloud infrastructure?
Spotify encountered issues such as the exponential growth of software and infrastructure compared to the number of developers, leading to fragmented infrastructure choices and a lack of mechanisms to update existing resources to current standards.
How does Spotify implement declarative infrastructure using Kubernetes?
Spotify uses custom resource definitions (CRDs) in Kubernetes to model infrastructure resources, which are reconciled by operators to maintain the desired state of cloud resources, facilitating a more manageable and scalable infrastructure.
What is the purpose of the break-glass mechanism in Spotify's infrastructure?
The break-glass mechanism allows teams to make emergency changes to infrastructure without going through the full code review process, ensuring that urgent updates can be applied quickly while maintaining overall control and governance.
Why did Spotify choose Kubernetes for their declarative infrastructure?
Kubernetes was selected because it met Spotify's requirements for a GitOps workflow, runtime introspection, and the ability to manage existing infrastructure resources, allowing for a more streamlined and efficient management process.

Key Statistics & Figures

Number of GCP projects managed
3,000-plus
This reflects the scale of Spotify's cloud infrastructure management efforts.
Approximate number of GCP resources
50K
This indicates the extensive cloud resource footprint that Spotify manages through its declarative infrastructure.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Orchestration
Kubernetes
Used for modeling and managing infrastructure resources through custom resource definitions and operators.
Cloud Service
Google Cloud Platform
Serves as the primary cloud infrastructure for Spotify's services.
Tool
Config Connector
Facilitates the import of existing cloud resources into the declarative infrastructure platform.
Developer Portal
Backstage
Provides plugins for generating infrastructure manifests and managing cloud resources.

Key Actionable Insights

1
Adopting a GitOps workflow can significantly enhance infrastructure management efficiency.
By integrating infrastructure configuration with source code, teams can ensure better version control, peer reviews, and audit trails, which are critical for maintaining compliance and operational integrity.
2
Utilizing custom resources in Kubernetes allows for tailored infrastructure management solutions.
This approach enables teams to encapsulate complex configurations and automate resource management, reducing manual overhead and potential errors in cloud resource provisioning.
3
Implementing a break-glass mechanism can provide necessary flexibility during emergencies.
This allows teams to bypass standard processes for urgent changes, ensuring that critical infrastructure can be adjusted quickly without compromising overall governance.

Common Pitfalls

1
Failing to maintain documentation and knowledge transfer during team changes can lead to fragmented infrastructure management.
As teams grow and change, it's crucial to ensure that architectural decisions and infrastructure choices are documented and communicated to prevent loss of context and ownership.

Related Concepts

Gitops Practices For Infrastructure Management
Custom Resource Definitions In Kubernetes
Cloud Resource Management Strategies