Accessible Machine Learning through Data Workflow Management

Jianyong Zhang, Eric Chen, Sally Lee

Uber

•

Jianyong Zhang, Eric Chen, Sally Lee

•8 min read•intermediate•

--

•View Original

ApacheApache SparkMachine LearningTensorFlow

Overview

The article discusses how Uber utilizes a data workflow management system called Piper to enhance accessibility and efficiency in machine learning (ML) processes. It highlights the challenges faced in integrating ML into business operations and details the workflows that facilitate model training, deployment, and monitoring.

What You'll Learn

1

How to automate model training and deployment using Piper

2

Why data accessibility is crucial for effective machine learning

3

How to manage workflows for large-scale feature engineering

Key Questions Answered

What challenges does Uber face in integrating machine learning into its processes?

Uber encounters challenges such as selecting the appropriate model for specific problems, automating model training and deployment, and scaling models to multiple cities. These challenges primarily revolve around making machine learning more accessible and usable for various teams within the organization.

How does Piper facilitate machine learning workflows at Uber?

Piper supports approximately 3,000 active workflows that manage model training and feature generation. It automates tasks such as data ingestion, ETL processes, and model deployment, ensuring that workflows run smoothly and efficiently across Uber's distributed resources.

What are the stages involved in Piper's model training workflow?

Piper's model training workflow consists of four main stages: model training, performance validation, model deployment, and performance monitoring. Each stage is designed to ensure that models are accurately trained and effectively deployed, with continuous monitoring for performance metrics.

Key Statistics & Figures

Active workflows supported by Piper

3,000

These workflows are directly involved in model training and feature generation across Uber.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Workflow Management

Piper

Piper is used to automate and manage machine learning workflows at Uber.

Data Storage

Apache Hadoop

Hadoop serves as the data lake for storing and processing data ingested through Piper.

Data Processing

Apache Spark

Spark is utilized for processing and transforming data during the deep learning workflows.

Machine Learning Platform

Michelangelo

Michelangelo is integrated with Piper to facilitate model training and deployment.

Key Actionable Insights

1
Implementing automated workflows can significantly reduce the time spent on manual model training processes.
By using Piper, teams can streamline their machine learning workflows, allowing them to focus on model optimization rather than repetitive tasks.

2
Ensuring data accessibility is vital for successful machine learning initiatives.
Piper's design emphasizes making data readily available for model training, which is crucial for developing accurate and reliable machine learning models.

3
Utilizing a structured workflow for feature engineering can enhance model performance.
Piper's ability to manage large-scale feature engineering workflows allows data scientists to efficiently prepare data, leading to better model outcomes.

Common Pitfalls

1

Failing to automate model training can lead to inefficiencies and increased time to deployment.

Without automation, teams may struggle with repetitive tasks, resulting in delays and potential errors in the model training process.

Related Concepts

Machine Learning Workflows

Data Accessibility In ML

Feature Engineering