Continuous Integration and Deployment for Machine Learning Online Serving and Models

Joseph Wang, Jia Li, Yi Zhang, Yunfeng Bai
9 min readadvanced
--
View Original

Overview

The article discusses Uber's approach to Continuous Integration and Deployment (CI/CD) for machine learning models and online serving. It highlights the challenges faced in deploying a large volume of models and the solutions implemented to enhance model management and service reliability.

What You'll Learn

1

How to implement dynamic model loading in a Real-time Prediction Service

2

Why model auto-retirement is crucial for resource management

3

When to apply different model rollout strategies like shadowing and gradual rollout

4

How to ensure high confidence in automated CI/CD processes for machine learning models

Prerequisites & Requirements

  • Understanding of MLOps concepts and practices
  • Experience with machine learning model deployment

Key Questions Answered

What challenges does Uber face in deploying machine learning models?
Uber faces challenges such as managing a large volume of model deployments, ensuring high availability of the Real-time Prediction Service, and addressing the memory footprint associated with multiple models. Additionally, they encounter issues with model rollout strategies and the need for a robust CI/CD process to validate new releases.
How does Uber implement model auto-retirement?
Uber's model auto-retirement process allows model owners to set expiration periods for models. If a model is unused beyond this period, the system triggers a warning and retires the model, significantly reducing resource footprint and preventing unnecessary storage costs.
What is the purpose of dynamic model loading in Uber's architecture?
Dynamic model loading allows the Real-time Prediction Service to decouple model and server development cycles. It enables the service to periodically check a Model Artifact & Config store, load new models, and remove retired ones, facilitating faster iterations in production.
What strategies does Uber use for model rollout?
Uber employs various model rollout strategies, including shadowing and gradual rollout. Shadowing involves duplicating traffic to a secondary model for testing, while gradual rollout shifts traffic distribution among multiple models to ensure performance before full deployment.

Technologies & Tools

Backend
Real-time Prediction Service
Used for serving machine learning models and managing deployments.
Backend
Model Artifact & Config Store
Holds the target state of models to be served in production.

Key Actionable Insights

1
Implementing dynamic model loading can significantly enhance your model deployment process.
By decoupling model and service development, teams can iterate faster and reduce bottlenecks caused by tightly coupled deployments.
2
Utilizing model auto-retirement helps maintain an efficient resource footprint.
Setting expiration periods for models ensures that unused models do not consume storage and memory, which can lead to performance issues.
3
Adopting a robust CI/CD pipeline is essential for maintaining high confidence in model deployments.
This approach minimizes risks associated with code and dependency changes, ensuring that models behave consistently in production.

Common Pitfalls

1
Failing to retire unused models can lead to increased storage costs and memory issues.
Many teams forget to integrate model cleanup into their workflows, resulting in unnecessary resource consumption.
2
Not validating models against existing deployments can cause unexpected behavior in production.
Without proper validation, new models may not perform as expected due to changes in dependencies or service configurations.

Related Concepts

Mlops
Model Deployment Strategies
Continuous Integration And Deployment