Evolution and Scale of Uber’s Delivery Search Platform

Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen
11 min readadvanced
--
View Original

Overview

The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experience across various markets. It details the architecture, model training, deployment challenges, and the strategies used to maintain high performance and reliability.

What You'll Learn

1

How to implement a semantic search model using deep learning techniques

2

Why Approximate Nearest Neighbor (ANN) indexing is crucial for scaling search systems

3

How to optimize embedding dimensions for performance and cost efficiency

Prerequisites & Requirements

  • Understanding of semantic search and machine learning concepts
  • Familiarity with PyTorch and Hugging Face Transformers(optional)

Key Questions Answered

How does Uber's Delivery Search Platform enhance user experience?
Uber's Delivery Search Platform enhances user experience by implementing a semantic search model that understands user intent, allowing for quicker and more accurate retrieval of stores, dishes, and grocery items. This leads to higher conversion rates and improved basket quality, especially for long-tail queries and multilingual markets.
What challenges did Uber face in deploying their semantic search system?
Uber faced challenges in balancing retrieval accuracy with infrastructure costs, ensuring safe automated index deployments, and maintaining real-time consistency checks. They implemented a blue/green deployment strategy to allow for seamless updates without disrupting live traffic.
What role does the Matryoshka Representation Learning (MRL) play in Uber's search system?
Matryoshka Representation Learning (MRL) is used to create embeddings that can be cut at different lengths, allowing for flexibility in balancing speed and accuracy. This enables Uber to serve smaller embeddings quickly while maintaining high retrieval quality.

Key Statistics & Figures

Latency reduction
34%
Lowering shard-level k from 1,200 to around 200 yielded this latency reduction.
CPU savings
17%
This was achieved alongside the latency reduction by adjusting the shard-level k.
Storage cost reduction
nearly 50%
This was accomplished by using MRL to serve smaller embedding cuts with minimal loss in retrieval quality.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implementing a semantic search model can significantly improve user engagement and conversion rates.
By understanding user intent through semantic search, businesses can provide more relevant results, reducing bounce rates and increasing satisfaction.
2
Utilizing Approximate Nearest Neighbor (ANN) indexing can drastically reduce search latency.
ANN allows for efficient retrieval from large datasets, making it essential for applications that require real-time responses, such as e-commerce and delivery services.
3
Regularly updating your search index ensures that users receive the most relevant and timely results.
By scheduling biweekly updates, Uber maintains the freshness of their search results, which is crucial in fast-paced environments like food delivery.

Common Pitfalls

1
Failing to validate input data during model refreshes can lead to corrupted search results.
This can happen if the new index is built without proper checks, potentially affecting user experience. Implementing automated validation checks before deployment can mitigate this risk.

Related Concepts

Semantic Search
Approximate Nearest Neighbor (ann) Indexing
Deep Learning For Search Optimization