Manas HNSW Realtime: Powering Realtime Embedding-Based Retrieval

Pinterest Engineering
10 min readadvanced
--
View Original

Overview

The article discusses the implementation of HNSW (Hierarchical Navigable Small World graphs) in Pinterest's Manas search engine for real-time embedding-based retrieval. It covers the challenges faced, optimizations made, and the architecture of the Manas Realtime system, highlighting the transition from batch indexing to real-time processing.

What You'll Learn

1

How to implement HNSW for real-time embedding-based retrieval

2

Why lock-free techniques improve performance in concurrent systems

3

How to optimize graph compaction for HNSW

4

How to monitor recall for embedding-based retrieval systems

Prerequisites & Requirements

  • Understanding of embedding-based retrieval concepts
  • Familiarity with Kafka as a write-ahead log(optional)
  • Experience with concurrent programming techniques(optional)

Key Questions Answered

What challenges did Pinterest face when implementing HNSW in real-time?
Pinterest faced challenges in serving HNSW in real-time due to the lack of open-source implementations and the need to optimize for low latency and high throughput. The transition from batch indexing to real-time processing required significant architectural changes and optimizations to ensure efficient handling of concurrent reads and writes.
How does the lock-free implementation of HNSW improve performance?
The lock-free implementation of HNSW reduces lock contention, which can lead to high CPU usage in lock-based systems. By using atomic operations and a pre-allocated graph, the system achieves better concurrent read and write performance, allowing for more efficient updates and queries in real-time environments.
What is the purpose of the online recall monitoring tool?
The online recall monitoring tool is designed to compute both approximate nearest neighbors (ANN) and exact nearest neighbors (KNN) results in real-time. This allows Pinterest to track the quality of its embedding-based retrieval system and ensure that optimizations do not degrade the quality of search results.
What optimizations were made for HNSW graph compaction?
Optimizations for HNSW graph compaction included the 'Add on Merger' strategy, which reuses existing segments to build a new graph while maintaining connectivity. This approach reduces compaction time and CPU usage compared to the 'Clean Slate Merger' method, which was too slow for practical use.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Algorithm
Hnsw
Used for real-time embedding-based retrieval in Pinterest's Manas search engine.
Tool
Kafka
Serves as a write-ahead log for ingesting writes into the Manas Realtime system.

Key Actionable Insights

1
Implementing a lock-free approach in concurrent systems can significantly enhance performance by reducing contention.
This is particularly important in high-throughput environments where multiple reads and writes occur simultaneously, as seen in the HNSW implementation.
2
Utilizing online recall monitoring tools can help maintain the quality of search results in real-time systems.
By continuously tracking recall metrics, teams can identify and address quality degradations promptly, ensuring a better user experience.
3
Optimizing graph compaction strategies can lead to substantial improvements in system efficiency.
Choosing the right compaction method, such as the Add on Merger, can minimize downtime and CPU usage, which is critical for maintaining system performance.

Common Pitfalls

1
Over-reliance on lock-based mechanisms can lead to performance bottlenecks in concurrent systems.
This often results in high CPU usage and reduced throughput, making it essential to explore lock-free alternatives for better scalability.
2
Neglecting the importance of recall monitoring can lead to degraded search quality over time.
Without proper monitoring, optimizations may inadvertently lower the quality of results, which can harm user satisfaction and engagement.

Related Concepts

Embedding-based Retrieval Systems
Approximate Nearest Neighbor (ann) Search
Real-time Data Processing
Concurrent Programming Techniques