From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey

Kai Wang, Min Cai, Joseph Wang, Eric Chen

Uber

•

Kai Wang, Min Cai, Joseph Wang, Eric Chen

•28 min read•advanced•

--

•View Original

ApacheApache SparkAutoMLDeep LearningDockerGenerative AIHugging FaceKerasKubernetesPaLMPrompt EngineeringPyTorchTensorFlowXGBoost

Overview

The article discusses Uber's evolution in machine learning (ML) through its centralized platform, Michelangelo, highlighting its transition from predictive to generative AI. It outlines the significant advancements in ML capabilities, developer experience, and the integration of deep learning techniques over the past eight years.

What You'll Learn

1

How to leverage Michelangelo for end-to-end machine learning lifecycle management

2

Why deep learning models can outperform traditional models in specific use cases

3

When to implement generative AI solutions for enhancing user experience

Prerequisites & Requirements

Understanding of machine learning concepts and workflows
Familiarity with ML frameworks like TensorFlow and PyTorch(optional)

Key Questions Answered

What role does Michelangelo play in Uber's machine learning strategy?

Michelangelo serves as Uber's centralized ML platform, enabling the development, deployment, and management of machine learning models at scale. It supports over 400 active ML projects, facilitating more than 20,000 model training jobs monthly and delivering 10 million real-time predictions per second.

How has Uber's approach to machine learning evolved over the years?

Uber's approach has transitioned from predictive ML using algorithms like XGBoost to adopting deep learning techniques and now exploring generative AI. This evolution reflects a focus on enhancing user experience and operational efficiency across its platforms.

What are the key phases in the evolution of Uber's ML platform?

The evolution of Uber's ML platform is divided into three phases: 2016-2019 focused on predictive ML, 2019-2023 emphasized deep learning, and starting in 2023, the focus shifted to generative AI, aiming to improve user experience and productivity.

What challenges did Uber face in its initial ML phases?

In the initial phases, Uber faced challenges such as a lack of comprehensive ML quality definitions, insufficient support for deep learning models, and fragmented tooling that hindered collaboration among teams. These issues led to inefficiencies in model development and deployment.

Key Statistics & Figures

Active ML projects managed on Michelangelo

400

This number reflects the scale at which Uber operates its ML initiatives.

Monthly model training jobs

20,000

Indicates the high volume of model development activities facilitated by Michelangelo.

Real-time predictions per second at peak

10 million

Demonstrates the platform's capability to handle large-scale operational demands.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

ML Platform

Michelangelo

Centralized platform for managing the ML lifecycle at Uber.

ML Framework

Tensorflow

Used for training deep learning models.

ML Framework

Pytorch

Another framework supported for deep learning model training.

Distributed Computing Framework

Ray

Utilized for distributed training and resource management.

Distributed Training Framework

Horovod

Facilitates distributed training of deep learning models.

Inference Server

Triton

Next-generation model serving engine integrated into Michelangelo.

Key Actionable Insights

1
Centralizing ML infrastructure can significantly enhance development efficiency across teams.
By establishing a unified platform like Michelangelo, Uber has streamlined its ML workflows, allowing teams to focus on building high-quality models rather than managing disparate systems.

2
Implementing a clear ML project tiering system helps prioritize resources effectively.
This approach ensures that high-impact projects receive the necessary attention and investment, optimizing the overall impact of ML initiatives within the organization.

3
Adopting deep learning should be based on specific use case requirements rather than as a default solution.
Uber's experience shows that while deep learning can be powerful, traditional models like XGBoost may outperform in certain scenarios, emphasizing the need for strategic decision-making.

Common Pitfalls

1

Failing to establish a centralized ML platform can lead to inefficiencies and duplicated efforts.

Without a unified system, individual teams may create their own ML infrastructures, resulting in inconsistencies and wasted resources.

2

Neglecting the importance of model quality metrics can result in poor performance and stale models in production.

Without comprehensive quality definitions, teams may overlook critical performance indicators, leading to suboptimal model effectiveness.

Related Concepts

Machine Learning Lifecycle

Deep Learning Techniques

Generative AI Applications