DeepETA: How Uber Predicts Arrival Times Using Deep Learning

Xinyu Hu, Olcay Cirit, Tanmay Binaykiya, Ramit Hora

Uber

•

Xinyu Hu, Olcay Cirit, Tanmay Binaykiya, Ramit Hora

•15 min read•advanced•

--

•View Original

ApacheApache SparkComputer VisionDeep LearningMachine LearningSelf-AttentionTensorFlowTransformerTransformersXGBoost

Overview

The article discusses DeepETA, Uber's advanced model for predicting arrival times using deep learning techniques. It highlights the transition from traditional methods to a deep learning approach, detailing the architecture, challenges, and improvements in accuracy and latency achieved through this innovative solution.

What You'll Learn

1

How to implement a low-latency deep neural network architecture for ETA prediction

2

Why hybrid models combining physical and machine learning approaches improve ETA accuracy

3

How to utilize feature hashing for efficient embedding in deep learning models

4

When to apply asymmetric Huber loss for robust ETA predictions

Prerequisites & Requirements

Understanding of deep learning concepts and architectures
Familiarity with Apache Spark and machine learning frameworks(optional)

Key Questions Answered

How does DeepETA improve ETA predictions compared to traditional methods?

DeepETA enhances ETA predictions by combining a routing engine's estimates with machine learning to predict residuals based on real-time data. This hybrid approach allows for more accurate predictions that adapt to changing conditions, unlike traditional methods that rely solely on static road graphs.

What challenges did Uber face when transitioning to deep learning for ETA predictions?

Uber faced challenges such as ensuring low latency for real-time predictions, improving accuracy over existing models, and maintaining generality across different business lines. These challenges were addressed through architectural innovations and rigorous testing of various neural network designs.

What techniques were used to ensure the DeepETA model is fast?

To minimize latency, DeepETA employs a linear transformer architecture that reduces the computational complexity of self-attention. Additionally, it utilizes discretized inputs and embedding lookups to speed up predictions, ensuring that the model meets stringent real-time serving requirements.

How does DeepETA handle different types of ETA predictions for various use cases?

DeepETA incorporates a bias adjustment decoder that tailors predictions based on the specific characteristics of different trips, such as delivery versus rideshare. This allows the model to account for variations in error distribution across different segments, improving overall accuracy.

Key Statistics & Figures

Mean Absolute Error (MAE)

Significantly improved over the incumbent XGBoost model

This improvement is essential for enhancing the accuracy of ETA predictions across Uber's services.

Model Complexity

One of the largest and deepest XGBoost ensembles in the world

This complexity was necessary to handle the growing dataset and improve prediction capabilities.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Apache Spark

Used for distributed training and improvements to the XGBoost model.

Machine Learning

Deep Learning

Core technology for developing the DeepETA model.

Key Actionable Insights

1
Leverage hybrid models that combine physical routing engines with machine learning to enhance prediction accuracy.
This approach allows for more adaptive and responsive predictions, particularly in dynamic environments where traditional models may fail to account for real-time changes.

2
Utilize feature hashing techniques to efficiently manage embedding space in deep learning models.
By reducing the dimensionality of input features through hashing, you can maintain model performance while improving computational efficiency, particularly in large-scale applications.

3
Implement asymmetric Huber loss in your regression models to better handle outliers and tailor predictions.
This loss function allows for flexibility in balancing the trade-offs between underprediction and overprediction, which is crucial in applications where timing is critical.

Common Pitfalls

1

Over-relying on traditional routing algorithms without integrating machine learning insights.

This can lead to outdated predictions that fail to capture real-time traffic conditions, resulting in poor user experiences.

Related Concepts

Hybrid Models In Machine Learning

Deep Learning Architectures

Real-time Data Processing

Loss Function Optimization