The Making of VES: the Cosmos Microservice for Netflix Video Encoding

Netflix Technology Blog
13 min readintermediate
--
View Original

Overview

The article discusses the development of the Video Encoding Service (VES) as part of Netflix's Cosmos microservice architecture, detailing its design, implementation, and the lessons learned throughout the process. It emphasizes the importance of microservices in modernizing Netflix's media processing pipelines to enhance flexibility, efficiency, and developer productivity.

What You'll Learn

1

How to build a microservice using the Cosmos platform

2

Why continuous release is crucial for modern software development

3

How to implement a Directed Acyclic Graph (DAG) for workflow management

4

When to use container shaping for resource allocation in microservices

Prerequisites & Requirements

  • Understanding of microservices architecture
  • Familiarity with Docker and cloud deployment(optional)

Key Questions Answered

What are the key components of the Video Encoding Service (VES)?
The Video Encoding Service (VES) consists of three main components: Optimus, the API layer; Plato, the workflow layer; and Stratum, the computing layer. These components work together to handle video encoding tasks efficiently and asynchronously, utilizing a messaging system called Timestone for communication.
How does the Cosmos platform support continuous release?
The Cosmos platform facilitates continuous release by allowing small, cohesive code changes to be merged and deployed automatically. This process includes automated testing and deployment, significantly reducing the time from code merge to production deployment to around 30 minutes, compared to weeks in previous systems.
What is the role of the Directed Acyclic Graph (DAG) in VES?
The Directed Acyclic Graph (DAG) in VES represents the workflow for media processing, where nodes signify stages and edges denote dependencies. This structure allows for parallel processing of video chunks, enhancing performance and meeting latency requirements.
What challenges did the team face when defining the service scope for VES?
The team initially created separate encoding services for each codec format, leading to development overhead and code repetition. They later consolidated these into a single service with shared APIs, allowing for easier maintenance and feature updates while still supporting independent evolution of codec-specific functionalities.

Key Statistics & Figures

Time from code merge to feature landing
30 minutes
This is a significant improvement from the previous generation platform, which took 2–4 weeks for the same process.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Platform
Cosmos
Next generation media computing platform at Netflix for video processing.
Messaging System
Timestone
Priority-based messaging system used for communication between layers in VES.
Containerization
Docker
Used to package media processing tools into Stratum Functions.
Media Processing
Ffmpeg
Used for encoding videos within Stratum Functions.

Key Actionable Insights

1
Consolidate microservices with shared APIs to reduce development overhead.
By consolidating multiple encoding services into a single service, the team minimized repetitive code and streamlined feature updates, which is essential for maintaining efficiency in development.
2
Implement a robust testing framework to support continuous integration and deployment.
A pyramid-based testing framework helps ensure that changes are thoroughly tested at various levels, reducing the risk of issues in production and facilitating faster feedback for developers.
3
Utilize container shaping to optimize resource allocation based on workload requirements.
By defining different 'container shapes' for various encoding tasks, the service can maximize resource utilization, which is crucial for handling the high volume of encoding jobs efficiently.

Common Pitfalls

1
Overcomplicating service scope by creating too many microservices.
Initially, the team created separate services for each codec format, leading to increased complexity and maintenance challenges. Consolidating services can simplify development and reduce redundancy.

Related Concepts

Microservices Architecture
Continuous Integration And Deployment
Video Encoding Techniques