Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles

Yu Guo, Khalid Ashmawy, Eric Huang, Wei Zeng

Uber

•

Yu Guo, Khalid Ashmawy, Eric Huang, Wei Zeng

•28 min read•advanced•

--

•View Original

ApacheApache SparkFlaskGitJenkinsKubernetesMachine LearningMySQLPyTorchREST APISQLAlchemyTensorBoardTensorFlow

Overview

The article discusses Uber ATG's machine learning infrastructure and versioning control platform, VerCD, designed to manage the complexities of developing self-driving vehicles. It outlines the five-step life cycle of machine learning models, the various components involved, and how VerCD facilitates continuous integration and delivery (CI/CD) for efficient model management.

What You'll Learn

1

How to implement a five-step life cycle for machine learning models

2

Why continuous delivery is crucial for managing ML artifacts

3

How to automate data ingestion and validation processes

Prerequisites & Requirements

Understanding of machine learning concepts and workflows
Familiarity with CI/CD tools like Jenkins(optional)

Key Questions Answered

What is the purpose of VerCD in Uber ATG's ML workflow?

VerCD is designed to manage versioning and dependencies of machine learning artifacts in Uber ATG's self-driving vehicle development. It tracks all dependencies, including data and model artifacts, ensuring reproducibility and traceability throughout the ML workflow.

How does Uber ATG ensure the quality of its ML models?

Uber ATG employs a five-step life cycle for its ML models, which includes data ingestion, validation, training, evaluation, and serving. This structured approach helps maintain high-quality metrics before deploying models to self-driving vehicles.

What challenges does Uber ATG face with ML model dependencies?

The complexity of deep dependency graphs in the self-driving domain poses significant challenges for continuous delivery. Each model's dependencies can affect others, making it crucial to manage these interactions effectively to avoid inconsistencies.

Key Statistics & Figures

Daily trips supported by Uber

14 million

This statistic highlights the scale at which Uber operates and the importance of efficient ML model management.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Platform

Vercd

A set of tools and microservices for managing ML workflows and dependencies.

Data Processing

Apache Spark

Used for extracting data from logs in parallel to optimize training pipeline performance.

Machine Learning Framework

Tensorflow

Utilized for developing and running ML models.

Machine Learning Framework

Pytorch

Another framework used for developing ML models.

CI/CD Tool

Jenkins

Used for orchestrating builds and managing workflows in the ML pipeline.

Key Actionable Insights

1
Implementing a structured life cycle for ML models can significantly enhance the quality and reliability of deployments.
By following a defined process, teams can ensure that each model is thoroughly validated before deployment, reducing the risk of failures in production.

2
Automating data ingestion and validation processes can streamline model training and improve iteration speed.
This allows engineers to focus on refining models rather than managing data manually, leading to faster development cycles.

3
Utilizing a version control system tailored for ML artifacts can help manage complex dependencies effectively.
This is particularly important in self-driving vehicle development, where changes in one model can impact others, necessitating careful tracking of all components.

Common Pitfalls

1

Failing to track dependencies can lead to inconsistent model performance.

Without proper versioning and dependency management, changes in one model can negatively impact others, leading to unexpected behavior in production.

Related Concepts

Continuous Integration And Continuous Delivery (ci/Cd)

Machine Learning Operations (mlops)

Dependency Management In Software Development