Open Sourcing Peloton, Uber’s Unified Resource Scheduler

Min Cai, Mayank Bansal
5 min readintermediate
--
View Original

Overview

Uber has announced the open sourcing of Peloton, its unified resource scheduler designed to manage resources across various workloads efficiently. Peloton supports advanced resource management features and is suitable for web-scale companies, enhancing collaboration and resource utilization in the cluster management community.

What You'll Learn

1

How to leverage Peloton for unified resource scheduling in cloud environments

2

Why co-locating mixed workloads improves resource utilization

3

How to implement resource overcommitment and job preemption effectively

Key Questions Answered

What is Peloton and how does it function as a resource scheduler?
Peloton is a unified resource scheduler introduced by Uber that manages resources across distinct workloads, allowing for efficient resource sharing and workload co-location. It is designed for web-scale companies and can operate in both on-premise data centers and cloud environments.
What are the benefits of using Peloton for resource management?
Peloton enables better resource utilization by allowing mixed workloads to co-locate on shared clusters, which reduces inefficiencies associated with separate clusters for each workload type. This leads to improved capacity planning and operational efficiency.
What features are included in the current release of Peloton?
The current release of Peloton includes features such as elastic resource sharing, resource overcommit and task preemption, support for big data workloads, optimized machine learning capabilities, and a Protobuf/gRPC-based API for various programming languages.
How does Peloton support machine learning workloads?
Peloton supports machine learning workloads by managing GPU resources and enabling Gang scheduling for frameworks like TensorFlow and Horovod, allowing efficient handling of thousands of GPUs in production environments.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implementing Peloton can significantly enhance resource utilization across your workloads by allowing for the co-location of batch and online jobs.
This is particularly beneficial for organizations that experience fluctuating workloads, as it reduces the need for additional hardware and optimizes existing resources.
2
Utilize Peloton's elastic resource sharing capabilities to dynamically allocate resources based on workload demands.
This approach allows teams to respond quickly to changing requirements, ensuring that resources are used efficiently and effectively.
3
Consider the implications of job preemption in your scheduling strategy to avoid latency-sensitive job disruptions.
By carefully planning which jobs can be preempted, organizations can maintain performance while maximizing resource utilization.

Common Pitfalls

1
Failing to properly manage resource overcommitment can lead to performance degradation, especially for latency-sensitive jobs.
It's crucial to identify which jobs can be preempted and to co-locate lower-priority jobs with them to optimize resource usage without impacting critical services.