•Bo Ling, Melissa Barr, Dhruva Dixith Kurra, Chun Zhu, Nicholas Marcott•18 min read•advanced•
--
•View OriginalOverview
The article discusses the implementation of Two-Tower Embeddings (TTE) at Uber, highlighting its role in enhancing the efficiency and scalability of recommendation systems. It details the challenges faced, the architecture of the TTE model, and the significant improvements achieved in performance and infrastructure.
What You'll Learn
1
How to implement Two-Tower Embeddings in a recommendation system
2
Why Two-Tower Embeddings improve scalability and efficiency
3
When to use embeddings for enhancing recommendation accuracy
Prerequisites & Requirements
- Understanding of machine learning concepts and recommendation systems
- Familiarity with Uber's Michelangelo ML platform(optional)
Key Questions Answered
What are Two-Tower Embeddings and how do they work?
Two-Tower Embeddings are generated by a deep learning architecture consisting of a query tower and an item tower. The query tower encodes user profiles and search queries into embeddings, while the item tower encodes store and item information. This architecture allows for efficient matching and retrieval in recommendation systems.
How does Uber's recommendation system utilize Two-Tower Embeddings?
Uber's recommendation system uses Two-Tower Embeddings to optimize the retrieval of relevant stores for users. By pre-computing store embeddings and using real-time query embeddings, the system can quickly identify the best matches, significantly reducing computational costs.
What challenges did Uber face in implementing Two-Tower Embeddings?
Uber faced challenges related to scalability and high maintenance costs with existing models. The previous city-wise Deep Matrix Factorization model was not scalable, requiring extensive resources to maintain thousands of models across different cities.
What are the advantages of using Two-Tower Embeddings over previous models?
Two-Tower Embeddings offer scalability, efficiency, and the ability to utilize user engagement data as training labels, which reduces costs and improves performance. This model replaces thousands of city-specific models with a single global model.
Key Statistics & Figures
Model training time reduction
From hundreds of thousands of core-hours to thousands of core-hours per week
This improvement was achieved by replacing the Deep Matrix Factorization model with the Two-Tower Embeddings model.
Recall@500 improvement
From 89% to 93%
This increase was noted after applying logQ correction in the training process.
Technologies & Tools
ML Platform
Michelangelo
Used for data preparation, training, evaluation, deployment, and serving of the Two-Tower Embeddings model.
Data Processing
Apache Spark™
Previously used for creating city-wise models, now integrated into the new TTE model architecture.
Key Actionable Insights
1Implementing Two-Tower Embeddings can significantly enhance the performance of recommendation systems by improving retrieval speed and accuracy.This approach allows for real-time processing of user queries, making it essential for applications that require quick responses, such as food delivery services.
2Utilizing embeddings can reduce the computational costs associated with traditional recommendation systems.By pre-computing embeddings and using efficient indexing methods, companies can save on resources while maintaining high-quality recommendations.
3Adopting a scalable model like TTE can facilitate the expansion of recommendation systems across multiple regions without the need for extensive infrastructure changes.This is particularly beneficial for businesses looking to grow their services globally while managing costs effectively.
Common Pitfalls
1
Underestimating the computational resources required for training large models can lead to delays and inefficiencies.
It's crucial to plan for the infrastructure needed to support high-cardinality features and large datasets to avoid bottlenecks.
2
Failing to incorporate context-specific factors in evaluation metrics may result in misleading performance assessments.
Using traditional metrics without considering user context can lead to a misunderstanding of how well the model performs in real-world scenarios.
Related Concepts
Machine Learning
Recommendation Systems
Deep Learning
Embeddings