Introducing Complete the Look: a scene-based complementary recommendation system

Pinterest Engineering

•

Pinterest Engineering

•8 min read•intermediate•

--

•View Original

Computer Vision

Overview

The article introduces Complete the Look, a scene-based complementary recommendation system developed by Pinterest's Visual Search team. This system enhances visual discovery by leveraging rich scene context to recommend compatible fashion and home decor products, significantly outperforming previous recommendation systems.

What You'll Learn

1

How to leverage scene context for product recommendations

2

Why modeling style compatibility is essential in visual search

3

How to implement a triplet loss function for training models

Prerequisites & Requirements

Understanding of deep learning concepts and neural networks
Familiarity with Python and machine learning libraries like TensorFlow or PyTorch(optional)

Key Questions Answered

How does Complete the Look improve product recommendations?

Complete the Look improves product recommendations by utilizing scene images that provide context, such as body type and season, to suggest visually compatible items. This approach allows for personalized recommendations that go beyond mere visual similarity, enhancing the user experience in fashion and home decor.

What is the architecture of the CTL model?

The CTL model is a deep convolutional feed-forward neural network that includes an image featurizer based on ResNet50 and a CTL head that combines global feature similarity with a local spatial attention mechanism. This architecture allows the model to focus on specific regions of the image to enhance recommendation accuracy.

What datasets were used for training the CTL model?

The CTL model was trained on a labeled dataset consisting of positive examples of scene and product image pairs, along with product category and bounding box annotations. Negative examples were also included to help the model learn compatibility without memorizing specific products.

What loss function is used in the CTL model?

The CTL model uses a triplet loss formulation, which encourages the distance between the scene and positive product image to be less than the distance between the scene and negative product image. This approach helps the model learn effective visual complementarity.

Key Statistics & Figures

Performance improvement over previous systems

Significantly better

The CTL model outperformed previous recommendation systems during early testing, demonstrating its effectiveness in providing personalized recommendations.

Technologies & Tools

Backend

Resnet50

Used as the image featurizer in the CTL model to extract features from scene and product images.

Key Actionable Insights

1
Utilize scene context in your recommendation systems to enhance user engagement.
By incorporating contextual information from scene images, such as user preferences and environmental factors, you can significantly improve the relevance of product recommendations, leading to higher conversion rates.

2
Implement a triplet loss function to train models for better compatibility learning.
Using triplet loss allows your model to learn nuanced relationships between items, which is crucial for applications like fashion and home decor where visual compatibility is subjective and complex.

3
Focus on local attention mechanisms to refine model predictions.
Incorporating local attention can help your model prioritize relevant features in images, which is especially important in domains where specific details can significantly impact user preferences.

Common Pitfalls

1

Assuming visual similarity equates to visual compatibility can lead to poor recommendations.

This misconception can result in models that fail to understand the nuanced preferences of users, highlighting the importance of learning compatibility through data rather than relying solely on visual features.

Related Concepts

Visual Search Technology

Deep Learning In Recommendation Systems

Scene-based Image Analysis