Spinnaker Orchestration

Netflix Technology Blog

Netflix

•

Netflix Technology Blog

•9 min read•advanced•

--

•View Original

JSONRedisSpring

Overview

The article discusses the evolution of Spinnaker's orchestration engine, Orca, highlighting the transition from Spring Batch to a custom command queue system. It details the constraints of the previous implementation and the improvements made in the new architecture, which enhances resiliency and reliability for handling thousands of pipelines daily.

What You'll Learn

1

How to implement a queue-based execution engine in Spinnaker

2

Why transitioning from Spring Batch to a custom solution improves pipeline management

3

How to configure pipelines to use the new execution engine

Prerequisites & Requirements

Understanding of orchestration engines and distributed systems
Familiarity with Redis and queue management concepts(optional)

Key Questions Answered

What are the main constraints of the original Orca implementation?

The original Orca implementation was stateful, locking pipelines to a single instance, which caused issues during instance failures or deployments. It also required planning the entire execution in advance, making it difficult to implement features like rolling deployments or restarting failed pipelines.

How does the new command queue system improve Orca's functionality?

The new command queue system allows for distributed processing across Orca instances, enabling better resiliency and reliability. Messages can be re-delivered if unacknowledged, allowing the system to tolerate instance failures and facilitating smoother deployments without draining work from older servers.

What operational capabilities have been added to Orca?

New operational capabilities include rate limiting and traffic shaping, allowing for better management of pipeline executions. This helps prioritize urgent tasks and manage workloads more effectively across the system.

Key Statistics & Figures

Pipelines handled daily

2000 to 5000

Orca manages between 2000 and 5000 pipelines per day, peaking at over 400,000 individual task executions on some days.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Spring

Used for scheduling and managing the message processing in the new queue system.

Database

Redis

Utilized for the queue implementation within Orca to enhance performance and reliability.

Key Actionable Insights

1
Transitioning to a queue-based execution model can significantly enhance the reliability of your orchestration engine.
By implementing a shared command queue, you can ensure that tasks are distributed across multiple instances, reducing the risk of pipeline failures during instance outages.

2
Utilize the new configuration options to gradually migrate existing pipelines to the new execution engine.
This approach allows for a smoother transition without disrupting ongoing operations, ensuring that you maintain backward compatibility while adopting new features.

Common Pitfalls

1

Over-reliance on a single instance for executing pipelines can lead to significant downtime during deployments.

This occurs when the orchestration engine is stateful, causing all running pipelines to halt if the instance fails. Transitioning to a distributed queue system mitigates this risk.

Related Concepts

Orchestration Engines

Distributed Systems

Pipeline Management

Resiliency In Cloud Applications