The Many Facets of 'Faceted Search'

Dmytro Ivchenko
12 min readadvanced
--
View Original

Overview

The article discusses faceted search, a crucial feature in LinkedIn's search experience, focusing on its implementation using an inverted index. It highlights the challenges of conventional faceted search approaches and presents a new algorithm that improves performance and correctness in large data sets.

What You'll Learn

1

How to implement an inverted index for faceted search

2

Why early termination is crucial for performance in search queries

3

When to use sampling for counting large cardinality facet values

Prerequisites & Requirements

  • Basic familiarity with inverted index-based approaches to search

Key Questions Answered

What is faceted search and how does it work?
Faceted search is a feature that allows users to filter search results based on multiple dimensions, such as location or company. It improves navigation and discovery by structuring search results, enabling users to refine their queries effectively.
What are the challenges of conventional faceted search?
Conventional faceted search approaches often sacrifice correctness for performance or vice versa. They struggle with accurate facet counts when using early termination and fail to provide both exact counts for low cardinality and performance for high cardinality.
How does the new algorithm improve faceted search performance?
The new algorithm uses inverted index posting lists for counting facet values, which allows for exact counts for low cardinality and estimates for high cardinality, significantly enhancing performance while retaining the option for early termination.
What results were achieved with the new faceting approach?
The new faceting algorithm resulted in a total runtime decrease by a factor of 7, with latencies improving significantly across various percentiles, demonstrating the effectiveness of the new approach in handling large data sets.

Key Statistics & Figures

Total runtime decrease
7 times
Achieved by applying the new approximation algorithm for faceted search.
Latency improvements
p50 decreased 1.2 times, p90 - 1.4 times, p95 - 2.6 times, p99 - 11.5 times
These improvements were observed after implementing the new faceting approach.

Technologies & Tools

Backend
Galene
LinkedIn's search stack that supports early termination and special retrieval queries.

Key Actionable Insights

1
Implementing an inverted index for faceted search can greatly enhance performance and accuracy.
This approach allows for precise counting of low cardinality facets while providing estimates for high cardinality, making it suitable for large datasets.
2
Utilizing early termination in search queries can significantly reduce response times.
By limiting the number of documents processed, you can achieve faster query results, especially in environments with high data volumes.
3
Sampling can be an effective strategy for counting in high cardinality scenarios.
This method allows for performance improvements without sacrificing too much accuracy, which is crucial in large-scale applications.

Common Pitfalls

1
Relying solely on forward indexing for facet value discovery can lead to inaccurate counts.
This occurs because early termination may prevent the retrieval of all documents, resulting in discrepancies in facet counts, particularly for low cardinality values.

Related Concepts

Inverted Indexing
Search Algorithms
Performance Optimization Techniques