Overview
The article discusses Peloton, Uber's unified resource scheduler designed to manage diverse cluster workloads efficiently. It highlights the challenges of underutilization in Uber's compute stack and explains how Peloton consolidates various workloads into a single platform to optimize resource allocation and improve operational efficiency.
What You'll Learn
1
How to implement a unified resource scheduler for diverse workloads
2
Why resource co-location improves cluster utilization
3
When to apply preemption strategies in resource management
Prerequisites & Requirements
- Understanding of cluster management concepts
- Familiarity with Apache Mesos and Docker(optional)
Key Questions Answered
What are the main categories of compute cluster workloads used at Uber?
Uber categorizes its compute cluster workloads into four main types: stateless, stateful, batch, and daemon jobs. Stateless jobs are long-running services without persistent states, while stateful jobs maintain persistent state on local disks. Batch jobs are typically preemptible and less sensitive to performance fluctuations, and daemon jobs run infrastructure components like Apache Kafka.
How does Peloton improve resource utilization at Uber?
Peloton improves resource utilization by co-locating diverse workloads on a single compute platform. This approach reduces the need for over-provisioning hardware and allows for better sharing of resources during peak demand periods, ultimately leading to lower operational costs.
What challenges did Uber face with its previous compute stack?
Uber's previous compute stack was underutilized due to dedicated clusters for different workloads, leading to inefficiencies. The dynamic nature of rideshare demand caused fluctuations that resulted in over-provisioning hardware for peak workloads, with some clusters starving for resources while others had excess capacity.
What types of preemption does Peloton use?
Peloton employs two types of preemption: inter-resource pool preemption, which enforces max-min fairness across resource pools, and intra-resource pool preemption, which prioritizes jobs within a resource pool based on their assigned priorities. This allows for efficient resource sharing while maintaining service level agreements.
Key Statistics & Figures
Number of jobs run per month
3 million
Peloton runs this volume of jobs across clusters comprised of over 8,000 machines.
Number of containers run per month
36 million
This statistic reflects the scale at which Peloton operates in managing workloads.
Number of hosts in the cluster
3,000
Peloton consolidated workloads from multiple small clusters into one large cluster for better efficiency.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Cluster Management
Apache Mesos
Peloton is built on top of Mesos to aggregate resources and manage tasks.
Containerization
Docker
Peloton uses Docker containers to manage workloads across the cluster.
Data Processing
Apache Spark
Peloton runs Spark jobs for various data analytics workloads.
Machine Learning
Tensorflow
Peloton supports distributed TensorFlow jobs for deep learning tasks.
Key Actionable Insights
1Implementing a unified resource scheduler like Peloton can significantly enhance resource utilization across diverse workloads.By consolidating workloads, organizations can reduce hardware costs and improve operational efficiency, especially during peak demand periods.
2Utilizing preemption strategies allows for dynamic resource allocation based on workload priority.This ensures that high-priority jobs receive the necessary resources while maintaining overall system performance.
3Co-locating batch jobs with stateless services can lead to better resource management.This approach leverages the complementary resource profiles of different workloads, optimizing the use of available compute resources.
Common Pitfalls
1
Underestimating the complexity of resource management in large-scale systems can lead to inefficiencies.
Organizations may struggle with resource contention and underutilization if they do not implement effective scheduling and resource sharing strategies.
2
Failing to account for workload variability can result in over-provisioning or under-provisioning resources.
Dynamic workloads require flexible resource management solutions to adapt to changing demands.
Related Concepts
Cluster Management Strategies
Resource Scheduling Techniques
Performance Optimization In Distributed Systems