The Evolution of Uber’s Search Platform

Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen

Uber

•

Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen

•15 min read•advanced•

--

•View Original

ApacheApache KafkaApache SparkAWSElasticsearchGoogle CloudGoogle Cloud StoragegRPCJSONSQL

Overview

The article discusses the evolution of Uber's Search Platform, highlighting its transition from Elasticsearch to an in-house solution called Sia, and ultimately to the adoption of OpenSearch. It emphasizes the importance of search in enhancing user experience across Uber's services and outlines the challenges and innovations encountered during this evolution.

What You'll Learn

1

How to implement a custom search engine architecture using OpenSearch

2

Why transitioning from Elasticsearch to OpenSearch can enhance scalability

3

When to adopt open-source solutions for large-scale search problems

Prerequisites & Requirements

Understanding of search engine architectures and their components
Familiarity with OpenSearch and Apache Kafka(optional)

Key Questions Answered

What were the limitations of Elasticsearch that led Uber to develop Sia?

Elasticsearch faced challenges with real-time responsiveness due to its near-real-time (NRT) semantics, which delayed updates until a flush operation. This was problematic for Uber's use cases, such as matching riders with drivers, which required immediate data availability and high throughput.

How does Uber's Sia architecture improve search performance?

Sia introduces a Live Index that buffers new data for real-time querying and periodically flushes to create a Snapshot Index. This architecture allows for high-throughput ingestion and supports concurrent reads and writes, addressing the limitations of Elasticsearch.

Why did Uber choose OpenSearch over other search technologies?

OpenSearch was selected for its robust, scalable, and extensible platform that aligns with Uber's architectural goals. Its open-source nature ensures long-term flexibility and avoids vendor lock-in, while benefiting from a growing community that actively contributes to its development.

What innovations did Uber contribute to OpenSearch?

Uber contributed features such as gRPC/Protobuf-based RPC communication and pull-based ingestion methods to OpenSearch. These innovations enhance performance and scalability, making OpenSearch more suitable for high-throughput environments like Uber's.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Search Engine

Opensearch

Used as the primary search platform for Uber's services.

Data Streaming

Apache Kafka

Utilized for data ingestion in Sia and OpenSearch.

Communication Protocol

Grpc

Adopted for efficient RPC communication in Sia.

Key Actionable Insights

1
Consider transitioning to an open-source search solution like OpenSearch to avoid vendor lock-in and leverage community support.
As businesses scale, adopting open-source technologies can provide flexibility and adaptability to evolving needs, ensuring that the search platform remains relevant and efficient.

2
Implement a read/write separation architecture to optimize search performance during data ingestion.
By decoupling the ingestion process from query serving, organizations can maintain low latency and high availability, which is crucial for real-time applications.

3
Utilize pull-based ingestion methods to improve data resilience and operational simplicity.
This approach allows systems to handle high write traffic without overwhelming upstream components, enhancing reliability and simplifying backpressure management.

Common Pitfalls

1

Relying too heavily on a single search technology can lead to scalability and performance issues as business demands grow.

Organizations should regularly evaluate their search infrastructure and be open to transitioning to more robust solutions that can handle increased load and complexity.

2

Failing to optimize data ingestion processes can result in high latency and degraded user experience.

Implementing a separation of concerns between data ingestion and query serving can mitigate these issues, ensuring that search performance remains high even during peak loads.

Related Concepts

Search Engine Architecture

Open-source Software

Real-time Data Processing

Data Ingestion Strategies