Powering Billion-Scale Vector Search with OpenSearch

Hao Sun, Jiasen Xu, Smit Patel, Anand Kotriwal, Xu Zhang

Uber

•

Hao Sun, Jiasen Xu, Smit Patel, Anand Kotriwal, Xu Zhang

•11 min read•advanced•

--

•View Original

ApacheApache SparkCSSEmbedding

Overview

The article discusses Uber's transition from traditional keyword-based search using Apache Lucene to implementing semantic vector search with Amazon OpenSearch. It highlights the challenges faced, the advantages of OpenSearch, and the significant performance improvements achieved in indexing and querying large datasets.

What You'll Learn

1

How to implement vector search using OpenSearch

2

Why GPU acceleration is important for vector search performance

3

How to optimize indexing processes for large datasets

Prerequisites & Requirements

Understanding of vector search concepts
Familiarity with Apache Spark and OpenSearch(optional)

Key Questions Answered

What challenges did Uber face when using Apache Lucene for vector search?

Uber encountered several challenges with Apache Lucene, including limited algorithm options, lack of GPU support, and slow response times. These issues hindered their ability to provide accurate results and efficiently deploy machine learning models, prompting the need for a more scalable solution like OpenSearch.

How did Uber optimize their indexing process with OpenSearch?

Uber reduced ingestion time from 12 hours to 2.5 hours by optimizing bulk indexing, CPU, memory, and Spark configurations. This optimization led to a performance improvement of over 79%, significantly enhancing their ability to handle large datasets.

What performance improvements were achieved after implementing OpenSearch?

After implementing OpenSearch, Uber decreased P99 latency from 250 ms to under 120 ms, representing a 52% reduction in latency. This improvement is crucial for maintaining a smooth user experience during search operations.

Why is GPU acceleration important for Uber's vector search?

GPU acceleration is important for Uber's vector search as it promises significant performance improvements, allowing for faster search results and better responsiveness. This capability is essential as their dataset continues to grow and requires more processing power.

Key Statistics & Figures

Ingestion time reduction

From 12 hours to 2.5 hours

This improvement was achieved through optimized bulk indexing and configuration tuning.

P99 latency reduction

From 250 ms to under 120 ms

This reduction is critical for meeting strict latency requirements in user search experiences.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Opensearch

Used as the vector search engine to improve search capabilities.

Backend

Apache Spark

Utilized for batch ingestion and indexing of large datasets.

Backend

Meta Faiss

Integrated for future GPU acceleration capabilities.

Key Actionable Insights

1
Implementing OpenSearch can significantly enhance your vector search capabilities, especially for large datasets.
By leveraging OpenSearch's flexibility and performance, organizations can improve search accuracy and speed, which is critical for user satisfaction.

2
Optimizing indexing processes is crucial for handling large-scale data efficiently.
Uber's experience shows that tuning configurations can drastically reduce ingestion times, which is vital for businesses that rely on timely data availability.

3
Consider GPU acceleration for future-proofing your vector search applications.
As datasets grow, traditional CPU processing may become a bottleneck. Integrating GPU capabilities can enhance performance and scalability.

Common Pitfalls

1

Underutilizing CPU resources during the indexing process can lead to inefficient performance.

Uber's initial setup showed that CPU usage was often below half of the allocated capacity, which slowed down the indexing process. To avoid this, ensure that your configurations are optimized for resource utilization.

2

Excessive disk I/O during indexing can significantly delay the process.

Uber observed that their baseline indexing process drove high read/write I/O, contributing to delays. Reducing unnecessary I/O through optimized settings can mitigate this issue.

Related Concepts

Vector Search Optimization Techniques

GPU Acceleration In Machine Learning

Batch Processing With Apache Spark