Creating Your Own EC2 Spot Market

Netflix Technology Blog
4 min readintermediate
--
View Original

Overview

The article discusses Netflix's approach to creating an internal EC2 Spot Market to optimize resource utilization and efficiency. It details the implementation of auto scaling and reserved instances, leading to an automated system that effectively leverages unused EC2 capacity.

What You'll Learn

1

How to implement auto scaling and leverage reserved instances for resource optimization

2

Why creating an internal spot market can improve resource utilization

3

When to borrow instances based on telemetry data

Prerequisites & Requirements

  • Understanding of AWS EC2 and auto scaling concepts
  • Familiarity with AWS management tools and APIs(optional)

Key Questions Answered

How did Netflix create its internal EC2 Spot Market?
Netflix created its internal EC2 Spot Market by implementing auto scaling and purchasing reserved instances, which allowed them to utilize over 12,000 unused instances daily. This system helps balance innovation, reliability, and efficiency as they scale globally.
What are the key requirements for automated borrowing in Netflix's system?
The key requirements include building telemetry to expose unused reservation counts, identifying short-duration or interruptible batch jobs, and ensuring teams can absorb telemetry data to set borrowing rules without jeopardizing critical service capacity.
What challenges did Netflix face in borrowing unused EC2 capacity?
Netflix faced challenges due to a lack of real-time data on unused capacity across accounts. They needed to create tooling and processes to automate borrowing on a larger scale, ensuring that borrowing did not impact critical services.

Key Statistics & Figures

Percentage of EC2 footprint that autoscales
15%
This indicates the scale at which Netflix is able to dynamically adjust its resources based on demand.
Daily peak of unused instances
12,000
This figure represents the potential capacity that can be leveraged through the internal spot market.
Minimum duration SLA for batch Encoding jobs
5 minutes to 1 hour
This duration makes these jobs suitable candidates for borrowing during off-peak hours.

Technologies & Tools

Cloud Computing
AWS EC2
Used for hosting services and managing compute resources.
Cloud Computing
Amazon Auto Scaling
Facilitates automatic scaling of resources based on demand.

Key Actionable Insights

1
Implement auto scaling and reserved instances to create an internal spot market for unused resources.
This approach can significantly improve resource utilization and cost efficiency, especially in large-scale environments like Netflix.
2
Develop telemetry systems to expose real-time data on unused EC2 reservations.
Having accurate data allows teams to make informed decisions about resource borrowing, optimizing job scheduling and resource allocation.
3
Encourage cross-team communication for resource sharing to enhance efficiency.
By fostering collaboration between teams, organizations can better utilize available resources and reduce waste.

Common Pitfalls

1
Failing to communicate effectively between teams can lead to inefficient resource utilization.
Without proper communication, teams may not be aware of available resources or the need for borrowing, leading to wasted capacity.
2
Relying solely on historical data for resource allocation can result in missed opportunities.
Real-time data is crucial for making informed decisions about resource usage, especially in dynamic environments.