SearchSage: aprendizaje de las representaciones de consultas de búsqueda en Pinterest

Pinterest Engineering

•

Pinterest Engineering

•14 min read•intermediate•

--

•View Original

Computer VisionEmbeddingLSTMModalNatural Language ProcessingPythonPyTorchTensorFlowTransformers

Overview

The article discusses SearchSage, a search query representation model developed by Pinterest to enhance the relevance of search results and user engagement. It details the model's architecture, training data, evaluation methods, and the significant performance improvements achieved through its implementation.

What You'll Learn

1

How to implement a two-tower model for search query representation

2

Why using engagement metrics improves search relevance

3

How to evaluate search models using Recall@k metrics

Prerequisites & Requirements

Understanding of neural network architectures and search algorithms
Familiarity with TensorFlow Serving and PyTorch(optional)

Key Questions Answered

How does SearchSage improve search query relevance on Pinterest?

SearchSage enhances search query relevance by using a two-tower model that learns representations of search queries and Pins based on user engagement metrics. This model has led to an 11% increase in clicks on product Pins and a 42% increase in related searches, demonstrating its effectiveness in improving user engagement.

What metrics are used to evaluate the performance of SearchSage?

The performance of SearchSage is evaluated using Recall@k metrics, specifically Recall@10, which measures the proportion of relevant Pins retrieved in the top results for given search queries. This approach allows for a comprehensive assessment of the model's ability to deliver relevant content to users.

What training data is used for the SearchSage model?

SearchSage is trained on pairs of search queries and Pins that have received significant user engagement, specifically focusing on saved Pins and long clicks (over 35 seconds). This approach ensures that the model learns from high-quality engagement signals.

What challenges were faced in implementing SearchSage?

One challenge was integrating the model into existing infrastructure while ensuring efficient preprocessing of input data. The team developed a custom PyTorch C++ operator to streamline the preprocessing, which allowed for better maintenance and alignment between training and execution environments.

Key Statistics & Figures

Increase in clicks on product Pins

11%

This increase was observed after implementing the SearchSage model, indicating improved user engagement.

Increase in related searches

42%

The model's implementation led to a significant rise in the number of related searches conducted by users.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Tensorflow Serving

Used for serving the SearchSage model in a production environment.

Backend

Pytorch

Utilized for developing custom operators for preprocessing input data.

Key Actionable Insights

1
Implement a two-tower model architecture to enhance search query representations.
This architecture allows for separate learning of query and item representations, leading to improved retrieval performance and relevance in search results.

2
Utilize user engagement metrics as training signals for search models.
Engagement metrics, such as long clicks and saved Pins, provide valuable insights into user preferences, which can significantly enhance the effectiveness of search algorithms.

3
Evaluate search models using Recall@k metrics to ensure relevance.
Using Recall@k provides a clear measure of how well the model retrieves relevant items, which is crucial for assessing the effectiveness of search algorithms.

Common Pitfalls

1

Failing to align training and execution environments can lead to inconsistencies in model performance.

This often occurs when preprocessing methods differ between training and production, making it crucial to standardize processes across environments.

Related Concepts

Neural Network Architectures

Search Algorithms

User Engagement Metrics