Overview
The article discusses how Uber utilizes a data workflow management system called Piper to enhance accessibility and efficiency in machine learning (ML) processes. It highlights the challenges faced in integrating ML into business operations and details the workflows that facilitate model training, deployment, and monitoring.
What You'll Learn
1
How to automate model training and deployment using Piper
2
Why data accessibility is crucial for effective machine learning
3
How to manage workflows for large-scale feature engineering
Key Questions Answered
What challenges does Uber face in integrating machine learning into its processes?
Uber encounters challenges such as selecting the appropriate model for specific problems, automating model training and deployment, and scaling models to multiple cities. These challenges primarily revolve around making machine learning more accessible and usable for various teams within the organization.
How does Piper facilitate machine learning workflows at Uber?
Piper supports approximately 3,000 active workflows that manage model training and feature generation. It automates tasks such as data ingestion, ETL processes, and model deployment, ensuring that workflows run smoothly and efficiently across Uber's distributed resources.
What are the stages involved in Piper's model training workflow?
Piper's model training workflow consists of four main stages: model training, performance validation, model deployment, and performance monitoring. Each stage is designed to ensure that models are accurately trained and effectively deployed, with continuous monitoring for performance metrics.
Key Statistics & Figures
Active workflows supported by Piper
3,000
These workflows are directly involved in model training and feature generation across Uber.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Workflow Management
Piper
Piper is used to automate and manage machine learning workflows at Uber.
Data Storage
Apache Hadoop
Hadoop serves as the data lake for storing and processing data ingested through Piper.
Data Processing
Apache Spark
Spark is utilized for processing and transforming data during the deep learning workflows.
Machine Learning Platform
Michelangelo
Michelangelo is integrated with Piper to facilitate model training and deployment.
Key Actionable Insights
1Implementing automated workflows can significantly reduce the time spent on manual model training processes.By using Piper, teams can streamline their machine learning workflows, allowing them to focus on model optimization rather than repetitive tasks.
2Ensuring data accessibility is vital for successful machine learning initiatives.Piper's design emphasizes making data readily available for model training, which is crucial for developing accurate and reliable machine learning models.
3Utilizing a structured workflow for feature engineering can enhance model performance.Piper's ability to manage large-scale feature engineering workflows allows data scientists to efficiently prepare data, leading to better model outcomes.
Common Pitfalls
1
Failing to automate model training can lead to inefficiencies and increased time to deployment.
Without automation, teams may struggle with repetitive tasks, resulting in delays and potential errors in the model training process.
Related Concepts
Machine Learning Workflows
Data Accessibility In ML
Feature Engineering