Meet Michelangelo: Uber’s Machine Learning Platform

Jeremy Hermann, Mike Del Balso

Uber

•

Jeremy Hermann, Mike Del Balso

•24 min read•advanced•

--

•View Original

AutoMLCassandraengineeringJavaMachine LearningREST APIScalascikit-learnSQLTensorFlowXGBoost

Overview

The article introduces Michelangelo, Uber's internal machine learning platform designed to democratize machine learning and streamline the process of building, deploying, and operating ML solutions at scale. It discusses the platform's architecture, workflow, use cases, and future enhancements.

What You'll Learn

1

How to build and deploy machine learning models using Michelangelo

2

Why standardizing ML workflows is crucial for scalability

3

When to leverage batch vs. online prediction services

Prerequisites & Requirements

Understanding of machine learning concepts and workflows
Familiarity with data processing tools like Spark and Hadoop(optional)

Key Questions Answered

What is Michelangelo and how does it function?

Michelangelo is Uber's internal machine learning platform that simplifies the process of building, deploying, and managing machine learning models at scale. It integrates various components for data management, model training, evaluation, and deployment, enabling teams to leverage machine learning effectively across the organization.

How does UberEATS utilize Michelangelo for delivery time predictions?

UberEATS employs Michelangelo to predict meal delivery times using gradient boosted decision tree regression models. These models incorporate various features, including historical data and real-time metrics, to provide accurate delivery time estimates at different stages of the order process.

What are the key components of Michelangelo's architecture?

Michelangelo's architecture includes a mix of open source components like HDFS, Spark, and Cassandra, along with in-house tools for data management and model serving. This architecture supports both online and offline predictions, ensuring scalability and efficiency in processing data.

What is the machine learning workflow in Michelangelo?

The machine learning workflow in Michelangelo consists of six key steps: managing data, training models, evaluating models, deploying models, making predictions, and monitoring predictions. This structured approach ensures that machine learning processes are standardized and reproducible across teams.

Key Statistics & Figures

Predictions served per second

250,000

This figure represents the capacity of the highest traffic models deployed using Michelangelo.

P95 latency for models not needing Cassandra features

less than 5 milliseconds

This latency applies to models that can operate without accessing additional features from the Cassandra store.

P95 latency for models requiring Cassandra features

less than 10 milliseconds

This latency is observed for models that depend on features stored in Cassandra.

Number of features in Feature Store

10,000

These features are utilized across various machine learning projects within Uber.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Data Storage

Hdfs

Used for storing Uber's transactional and log data.

Data Processing

Spark

Facilitates batch processing and model training.

Database

Cassandra

Stores features for online model predictions.

Streaming

Kafka

Aggregates logged messages from Uber's services.

Machine Learning

Tensorflow

Utilized for building and training deep learning models.

Key Actionable Insights

1
Standardizing machine learning workflows can significantly enhance productivity and model performance across teams.
By implementing a unified workflow, teams can avoid common pitfalls such as inconsistent data handling and model deployment issues, leading to more reliable outcomes.

2
Utilizing a shared feature store can reduce redundancy and improve data quality in machine learning projects.
When teams share features, they can leverage existing data insights, which accelerates the development process and fosters collaboration.

3
Monitoring model predictions is essential for maintaining accuracy over time.
By logging predictions and comparing them to actual outcomes, teams can identify model drift and take corrective actions to ensure continued performance.

Common Pitfalls

1

Failing to monitor model predictions can lead to undetected model drift.

Without ongoing monitoring, models may become less accurate over time as the underlying data distribution changes, resulting in poor performance.

2

Not standardizing data pipelines can result in inconsistent feature generation.

When teams use different methods for feature generation, it can lead to discrepancies in model training and prediction, ultimately affecting the reliability of outcomes.