Introducing Voyager: Spotify’s New Nearest-Neighbor Search Library

Peter Sobot
4 min readadvanced
--
View Original

Overview

Spotify has introduced Voyager, a new nearest-neighbor search library that significantly improves upon its predecessor, Annoy, by offering increased speed and accuracy. Voyager is designed for production use, providing robust support for both Java and Python, and aims to meet the evolving demands of the nearest-neighbor search ecosystem.

What You'll Learn

1

How to implement nearest-neighbor search in production applications using Voyager

2

Why Voyager offers more than 10 times the speed of Annoy at the same recall

3

When to choose Voyager over other nearest-neighbor search libraries

Prerequisites & Requirements

  • Understanding of nearest-neighbor search algorithms
  • Familiarity with Python or Java programming languages

Key Questions Answered

What improvements does Voyager offer compared to Annoy?
Voyager provides more than 10 times the speed of Annoy at the same recall and up to 50% more accuracy at the same speed. Additionally, it uses up to 4 times less memory than Annoy, making it a more efficient choice for nearest-neighbor search.
How does Voyager handle memory usage during index creation?
Voyager achieves 16 times less memory usage compared to hnswlib at index creation time, which is beneficial for applications with limited memory resources. This efficiency allows developers to create indices without incurring high memory costs.
What are the key features of Voyager?
Voyager features include fully multithreaded index creation and querying, production-ready fault-tolerant index files, and compatibility with Google Cloud Platform for stream-based I/O. It also supports string-based identifiers for querying.

Key Statistics & Figures

Speed improvement over Annoy
More than 10 times
At the same recall level
Accuracy improvement over Annoy
Up to 50% more
At the same speed
Memory usage reduction compared to Annoy
Up to 4 times less
Thanks to E4M3 8-bit floating point
Memory usage reduction during index creation compared to hnswlib
16 times less
At index creation time

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Leverage Voyager's multithreading capabilities to improve the performance of your nearest-neighbor search applications.
By utilizing multithreading, you can significantly reduce query times, especially in high-traffic environments where speed is crucial for user experience.
2
Consider Voyager for applications that require low memory usage without sacrificing accuracy.
With its reduced memory footprint compared to Annoy and hnswlib, Voyager is ideal for resource-constrained environments, allowing for efficient scaling of applications.
3
Utilize Voyager's support for both Python and Java to integrate nearest-neighbor search into diverse tech stacks.
This flexibility enables teams to adopt Voyager regardless of their existing programming language preferences, facilitating easier integration into current projects.

Common Pitfalls

1
Assuming that all nearest-neighbor search libraries provide similar performance and accuracy.
Different libraries have varying strengths and weaknesses, and it’s crucial to evaluate them based on specific use cases and requirements to avoid suboptimal performance.

Related Concepts

Nearest-neighbor Search Algorithms
Approximate Nearest-neighbor Search
Performance Optimization In Search Systems