Overview
The article discusses the implementation of a Large Language Model (LLM)-based relevance system for Pinterest Search, detailing its technical design, model architecture, and the results from both offline and online experiments. It highlights the improvements in search relevance and fulfillment rates achieved through this innovative approach.
What You'll Learn
1
How to implement a cross-encoder language model for search relevance prediction
2
Why knowledge distillation is essential for scaling LLMs in production
3
How to enrich text features for improved relevance modeling
Prerequisites & Requirements
- Understanding of machine learning concepts and model training
- Familiarity with LLMs and their applications in search systems(optional)
Key Questions Answered
How does Pinterest improve search relevance using LLMs?
Pinterest enhances search relevance by implementing a cross-encoder language model that predicts the relevance of Pins to user queries. This model is fine-tuned with human-annotated data, allowing for a more accurate alignment of search results with user intent.
What metrics were used to evaluate the effectiveness of the new relevance model?
The effectiveness of the new relevance model was evaluated using metrics such as nDCG@K and 5-scale relevance predictions. The model showed a +2.18% improvement in search feed relevance and increased fulfillment rates across various countries.
What are the key features used in the student relevance model?
The student relevance model utilizes query-level features, Pin-level features, and query-Pin interaction features. This includes embeddings from SearchSAGE and PinSAGE, as well as historical engagement rates to enhance relevance predictions.
Key Statistics & Figures
Improvement in search feed relevance
+2.18%
Measured by nDCG@20 after implementing the new relevance model.
Performance increase of Llama-3–8B over multilingual BERT-base
12.5%
This performance was measured in terms of 5-scale accuracy during model comparisons.
Performance increase of Llama-3–8B over the baseline model
19.7%
The baseline model relied solely on SearchSAGE embeddings.
Technologies & Tools
ML Model
Bert
Used as a baseline for the cross-encoder architecture.
ML Model
T5
Evaluated as one of the pre-trained language models for relevance prediction.
ML Model
Llama-3
Demonstrated superior performance in relevance predictions.
ML Model
Blip
Used for generating synthetic image captions.
ML Model
Pinsage
Provides embeddings for Pins to enhance relevance modeling.
ML Model
Searchsage
Utilized for query and Pin embeddings.
Key Actionable Insights
1Implementing a cross-encoder model can significantly enhance the accuracy of search relevance predictions.This approach allows for a more nuanced understanding of user queries and Pin content, which is essential in improving user satisfaction and engagement.
2Utilizing knowledge distillation can help in scaling LLMs effectively for real-time applications.By distilling a larger model into a smaller, more efficient one, organizations can maintain high performance while reducing latency and operational costs.
3Enriching text features with metadata and user engagement data can lead to better relevance modeling.Incorporating diverse data sources ensures that the model captures a comprehensive view of user intent, improving the overall search experience.
Common Pitfalls
1
Relying solely on historical engagement data can lead to biased relevance predictions.
This occurs because engagement metrics may not accurately reflect the current relevance of content, especially as user interests evolve. It's crucial to incorporate diverse data sources to ensure the model remains aligned with user intent.
Related Concepts
Machine Learning
Large Language Models
Search Relevance Systems
Knowledge Distillation
Text Feature Engineering