Michelangelo PyML: Introducing Uber’s Platform for Rapid Python ML Model Development

Kevin Stumpf, Stepan Bedratiuk, Olcay Cirit

Uber

•

Kevin Stumpf, Stepan Bedratiuk, Olcay Cirit

•15 min read•advanced•

--

•View Original

ApacheApache SparkDockergRPCJavaJSONMachine LearningPySparkPyTorchscikit-learnSQLTensorFlowThriftXGBoost

Overview

The article introduces Michelangelo PyML, Uber's platform designed for rapid Python machine learning model development. It emphasizes the integration with existing tools, the flexibility for data scientists, and the streamlined process for deploying models at scale.

What You'll Learn

1

How to use Michelangelo PyML for rapid Python ML model development

2

Why integrating PyML with existing Uber infrastructure enhances model deployment

3

When to utilize Docker for deploying machine learning models

Prerequisites & Requirements

Familiarity with Python and machine learning concepts
Basic understanding of Docker and Apache Spark(optional)

Key Questions Answered

What is Michelangelo PyML and how does it improve ML model development?

Michelangelo PyML is a platform that enables rapid development of Python machine learning models by integrating with Uber's existing ML infrastructure. It allows data scientists to prototype, validate, and deploy models efficiently, ensuring low latency and high scalability.

How does PyML ensure consistency between online and offline predictions?

PyML uses the same Docker image for both online and offline predictions, ensuring that there are no differences between the two. This eliminates discrepancies that can arise from separate implementations, allowing for reliable model performance across environments.

What are the key components of the PyML architecture?

The PyML architecture includes a data model that supports DataFrames and Tensors, a model definition interface that abstracts deployment details, and a Docker-based deployment system that ensures consistent outputs across different environments.

Key Statistics & Figures

Predictions per second powered by Michelangelo

1 million

This showcases the platform's capability to handle high-scale model serving efficiently.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Containerization

Docker

Used for packaging and deploying machine learning models in a consistent environment.

Big Data Processing

Apache Spark

Utilized for training models on large datasets within the Michelangelo platform.

Key Actionable Insights

1
Leverage Michelangelo PyML to streamline your ML model development process.
By using PyML, data scientists can reduce the friction in prototyping and deploying models, allowing for faster iterations and more efficient workflows.

2
Utilize Docker for your ML models to ensure consistent deployment across environments.
Docker allows you to encapsulate your model and its dependencies, ensuring that it behaves the same way in development, testing, and production environments.

Common Pitfalls

1

Failing to validate models across different environments can lead to inconsistent performance.

This often happens when models are developed and tested in isolation. Using PyML's Docker integration helps mitigate this risk by ensuring the same environment is used for both training and serving.

Related Concepts

Machine Learning Model Deployment

Docker For ML

Integration With Feature Stores