Overview
The article discusses Uber's collaboration with Oracle Cloud Infrastructure (OCI) and Ampere Computing to optimize the OCI AmpereOne M A4 Compute instances. It highlights the transition from on-premise data centers to Arm-based cloud infrastructure, focusing on performance improvements and energy efficiency.
What You'll Learn
1
How to optimize cloud infrastructure for Arm-based workloads
2
Why collaboration between software and hardware teams is crucial for performance
3
When to transition from x86 to Arm architecture in cloud environments
Prerequisites & Requirements
- Understanding of cloud infrastructure and Arm architecture
- Experience with performance benchmarking and optimization(optional)
Key Questions Answered
What are the benefits of using OCI AmpereOne M A4 Compute instances?
OCI AmpereOne M A4 Compute instances offer higher performance per watt, reduced energy consumption, and greater compute density. These benefits align with sustainability goals and operational flexibility, making them suitable for hyperscale providers and enterprise customers.
How did Uber transition to Arm-based cloud infrastructure?
Uber transitioned to Arm-based cloud infrastructure by migrating from on-premise data centers to OCI and Google Cloud Platform, which involved shifting massive workloads and introducing Arm-powered compute instances into a previously x86-dominated environment.
What key challenges did Uber face during the transition to Arm architecture?
Uber faced challenges such as performance issues with latency-sensitive operations and out-of-memory errors in Go services due to smaller Translation Lookaside Buffers (TLB) and cache sizes. Collaborative debugging helped identify these issues and optimize performance.
What insights did Uber provide to Ampere for designing the AmpereOne M silicon?
Uber Engineering shared detailed workload characteristics and performance targets with Ampere architects, which were instrumental in informing the design and optimization of the AmpereOne M silicon, leading to improved price-performance for OCI Ampere instances.
Key Statistics & Figures
Core Count
96
flexSKU for OCI
Clock Speed
Up to 3.6 GHz
96-core model
L2 Cache/Core
2 MB
This is a significant improvement over previous generations, enhancing data access speeds.
System Cache
64 MB
This larger cache size supports better performance for memory-intensive applications.
Memory
12x DDR5-5600, up to 1.5 TB
This configuration allows for high memory bandwidth and capacity, suitable for demanding workloads.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Cloud Platform
Oracle Cloud Infrastructure
Used for hosting Uber's workloads in a cloud environment.
Processor
Ampereone M
The optimized processor used in OCI Ampere instances for improved performance.
Architecture
Arm Architecture
The underlying architecture for the new OCI Ampere instances.
Key Actionable Insights
1Collaborate closely with hardware vendors to optimize cloud infrastructure for specific workloads.This collaboration can lead to tailored solutions that enhance performance and energy efficiency, as demonstrated by Uber's work with OCI and Ampere.
2Utilize high core counts per socket to reduce latency in multi-socket systems.By leveraging high core counts, organizations can avoid the communication latency introduced by multi-socket servers, improving overall system performance.
3Focus on understanding workload characteristics to inform system design.Detailed insights into workload behavior can guide the optimization of instance configurations, ensuring that resources are allocated efficiently.
Common Pitfalls
1
Underestimating the impact of cache and memory configurations on performance.
Smaller cache sizes can lead to increased page faults and slower performance, particularly in memory-intensive applications. It's crucial to evaluate and optimize these configurations based on workload requirements.
2
Failing to account for single-threaded performance needs in latency-sensitive applications.
Applications that require high single-thread performance may suffer if the architecture does not support turbo clocking, leading to inefficient resource utilization.
Related Concepts
Cloud Infrastructure Optimization
Arm Architecture Benefits
Performance Benchmarking Techniques