Our journey in applying embedding-based retrieval techniques to build an accurate and scalable candidate retrieval system for Airbnb Homes…
Overview
The article discusses the development of Airbnb's first Embedding-Based Retrieval (EBR) search system, which aims to improve the relevance of search results for users by narrowing down the pool of listings based on their queries. It outlines the challenges faced during the implementation, including training data construction, model architecture, and online serving strategies.
What You'll Learn
How to construct training data for machine learning models using contrastive learning
Why a two-tower architecture is beneficial for embedding-based retrieval systems
How to select the appropriate approximate nearest neighbor solution for online serving
Prerequisites & Requirements
- Understanding of machine learning concepts and embedding techniques
- Experience with building and deploying machine learning models(optional)
Key Questions Answered
What are the key challenges in building an embedding-based retrieval system?
How does Airbnb's EBR system improve search result relevance?
What approximate nearest neighbor solutions were considered for online serving?
Key Statistics & Figures
Technologies & Tools
Key Actionable Insights
1Implement a two-tower architecture for your embedding-based retrieval systems to separate the processing of listing features and query features.This architecture allows for offline computation of listing embeddings, reducing online latency and improving the overall efficiency of the retrieval process.
2Utilize contrastive learning techniques to construct training data that captures user behavior effectively.By pairing positive and negative examples based on user interactions, you can train models that better understand the context of user queries, leading to improved retrieval accuracy.
3Choose the right ANN solution based on your specific use case, balancing performance and speed.For systems with frequent updates, like Airbnb's, an inverted file index may provide better performance than more complex solutions, ensuring that retrieval remains efficient.