Evolving Michelangelo Model Representation for Flexibility at Scale

Anne Holler, Michael Mui

Uber

•

Anne Holler, Michael Mui

•15 min read•advanced•

--

•View Original

ApacheApache SparkDockerJavaMachine LearningPySparkSQLTensorFlowTransformerTransformers

Overview

The article discusses the evolution of the Michelangelo model representation at Uber to enhance flexibility and scalability in machine learning model serving. It highlights the transition from a monolithic architecture to a more modular and interoperable design that leverages Spark MLlib for improved performance and extensibility.

What You'll Learn

1

How to leverage Spark MLlib for scalable machine learning model serving

2

Why transitioning to a standard Spark ML pipeline improves model interoperability

3

How to implement OnlineTransformer for low-latency predictions

Prerequisites & Requirements

Understanding of machine learning concepts and Spark MLlib
Familiarity with Jupyter Notebook and PySpark(optional)

Key Questions Answered

What are the key motivations for evolving Michelangelo’s architecture?

The evolution of Michelangelo's architecture was motivated by the need for scalable machine learning models that support various use cases, improve model representation, and enhance online serving capabilities. This transition allows for more complex model pipelines and better integration with external tools.

How does Michelangelo ensure consistency between online and offline model serving?

Michelangelo maintains consistency by using a standard PipelineModel-driven architecture, which eliminates custom pre-scoring and post-scoring implementations. This approach ensures that the same scoring methods can be applied both in offline and online contexts, enhancing accuracy and reliability.

What performance improvements were achieved with the new model representation?

The new model representation reduced native Spark model load time from 8x-44x slower than custom protobuf to only 2x-3x slower. This translates to a 4x-15x speed-up over Spark native models, making it more suitable for online serving scenarios.

Key Statistics & Figures

Reduction in model load time

4x-15x speed-up over Spark native models

This improvement was achieved by optimizing the model representation and loading processes.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Apache Spark

Used for scalable machine learning model training and serving.

Tools

Pyspark

Facilitates distributed training and deployment of machine learning models.

Key Actionable Insights

1
Implementing a standard Spark ML pipeline for model representation can significantly enhance interoperability with external tools.
This approach allows data scientists to easily integrate models trained in Michelangelo with other Spark-based tools, facilitating a more flexible machine learning workflow.

2
Utilizing the OnlineTransformer interface can help achieve low-latency predictions in real-time applications.
By extending the Spark Transformer interface, developers can ensure that their models are optimized for online serving, which is critical for applications requiring quick response times.

3
Regular performance tuning of model loading processes can lead to substantial improvements in serving efficiency.
Identifying and addressing bottlenecks in model loading can significantly enhance the responsiveness of machine learning applications, especially in multi-tenant environments.

Common Pitfalls

1

Relying on custom model representations can lead to difficulties in supporting new Spark transformers.

This often results in increased complexity and slower adaptation to new features in Spark, making it harder for teams to innovate and integrate new models.

Related Concepts

Machine Learning Model Serving

Model Interoperability

Performance Optimization In ML Workflows