Adapting the Facebook Reels RecSys AI Model Based on User Feedback

We’ve improved personalized video recommendations on Facebook Reels by moving beyond metrics such as likes and watch time and directly leveraging user feedback.  Our new User True Interest Survey (…

Senthil Rajagopalan
7 min readintermediate
--
View Original

Overview

Meta describes their User True Interest Survey (UTIS) model for Facebook Reels, which moves beyond traditional engagement metrics like likes and watch time to directly leverage user feedback via in-app surveys. The approach uses a lightweight alignment model trained on survey responses to predict user satisfaction, resulting in significant improvements in recommendation accuracy, engagement, and retention across large-scale A/B tests with over 10 million users.

What You'll Learn

1

Why traditional engagement signals like likes and watch time fail to capture true user interests in recommendation systems

2

How to design and deploy large-scale in-product user surveys to collect direct interest feedback

3

How to build a lightweight alignment model layer on top of an existing ranking system using survey responses

4

How to integrate a user interest model at both early-stage retrieval and late-stage ranking in a recommendation funnel

5

How to evaluate recommendation quality improvements using both offline metrics and large-scale A/B testing

Prerequisites & Requirements

  • Understanding of recommendation systems and ranking models (multi-task, multi-label architectures)
  • Familiarity with machine learning concepts including knowledge distillation, binary classification, and A/B testing
  • Understanding of information retrieval metrics such as precision, recall, and accuracy
  • Experience with large-scale ML systems and ranking pipelines(optional)

Key Questions Answered

Why are engagement signals like likes and watch time insufficient for recommendation systems?
Engagement signals such as likes, shares, and watch time are noisy and don't fully capture what users actually care about. Models trained only on these signals tend to recommend content with high short-term value but miss true interests important for long-term utility. Effective interest matching encompasses factors beyond topic alignment, including audio, production style, mood, and motivation.
How does Meta's UTIS model work to improve Facebook Reels recommendations?
The UTIS model is a lightweight alignment layer trained on binarized user survey responses that asks 'How well does this video match your interests?' on a 1-5 scale. It uses existing predictions from the main multi-task ranking model as input features, along with engineered features capturing user behavior, content attributes, and interest signals, to output the probability that a user is satisfied with a video.
How accurate were previous heuristic methods at identifying user interests compared to UTIS?
Previous interest heuristics only achieved 48.3% precision in identifying true interests. The UTIS model improved accuracy from 59.5% to 71.5%, precision from 48.3% to 63.2%, and recall from 45.4% to 66.1%, demonstrating substantially better ability to identify users' actual interest preferences compared to the heuristic baseline.
How is the UTIS model integrated into Facebook Reels' ranking pipeline?
UTIS is deployed at two stages: In Late Stage Ranking (LSR), it runs in parallel to the LSR model providing an additional input feature to the final value formula. In Early Stage Ranking (Retrieval), it reconstructs users' true interest profiles by aggregating survey data and uses knowledge distillation to align user-to-item retrieval models using UTIS predictions as labels.
What were the real-world results of deploying the UTIS model in Facebook Reels?
Large-scale A/B testing with over 10 million users showed a +5.4% increase in high survey ratings, -6.84% reduction in low survey ratings, +5.2% boost in total user engagement, and -0.34% decrease in integrity violations. The model also increased delivery of high-quality niche content and reduced low-quality generic popularity-based recommendations.
How does Meta collect user perception data for training the UTIS model?
Meta deploys large-scale randomized surveys within the video feed, where a proportion of users are randomly chosen to see a single-question survey asking 'To what extent does this video match your interests?' on a 1-5 scale. Responses are weighted to correct for sampling and nonresponse bias, collecting thousands of in-context responses daily across Facebook Reels and other video surfaces.
What challenges remain for survey-driven recommendation models?
Key remaining challenges include better serving users with sparse engagement histories, reducing bias in survey sampling and delivery, further personalizing recommendations for diverse user cohorts, and improving recommendation diversity. Meta is exploring advanced modeling techniques including large language models and more granular user representations to address these issues.

Key Statistics & Figures

Previous heuristic precision for identifying true interests
48.3%
Baseline performance before UTIS model
UTIS model accuracy improvement
59.5% to 71.5%
Offline accuracy improvement over heuristic baseline
UTIS model precision improvement
48.3% to 63.2%
Offline precision improvement over heuristic baseline
UTIS model recall improvement
45.4% to 66.1%
Offline recall improvement over heuristic baseline
A/B test user population
10 million+ users
Large-scale online A/B testing
Increase in high survey ratings
+5.4%
Online A/B test result
Reduction in low survey ratings
-6.84%
Online A/B test result
Boost in total user engagement
+5.2%
Online A/B test result
Decrease in integrity violations
-0.34%
Online A/B test result

Technologies & Tools

AI/ML
Machine Learning
Multi-task, multi-label ranking model and lightweight UTIS alignment layer
AI/ML
Knowledge Distillation
Aligning user-to-item retrieval models using UTIS predictions as labels
Experimentation
A/B Testing
Large-scale online evaluation with 10 million+ users
Platform
Facebook Reels
Primary video recommendation surface for UTIS deployment

Key Actionable Insights

1
Supplement implicit engagement signals with direct user feedback surveys to capture true interest. Engagement metrics like watch time and likes are noisy proxies that only achieved 48.3% precision at identifying true user interests, meaning over half of inferred interests were wrong.
This is particularly impactful for recommendation systems where short-term engagement optimization diverges from long-term user satisfaction and retention.
2
Design a lightweight alignment model that sits on top of your existing ranking system rather than rebuilding from scratch. The UTIS model uses existing main model predictions as input features, making it efficient to train on sparse survey data while leveraging the full power of the existing ranking infrastructure.
This 'perception layer' architecture enables rapid iteration on user feedback signals without disrupting the core ranking model that handles engagement prediction.
3
Binarize survey responses to simplify modeling and reduce noise variance. Rather than predicting the exact 1-5 scale rating, converting to binary interest/no-interest labels makes the model more robust and easier to train on sparse survey data.
Survey data is inherently sparse since only a small proportion of viewing sessions include surveys, so denoising strategies are critical for model quality.
4
Apply user interest models at multiple stages of the recommendation funnel—both retrieval and final ranking—for compounding improvements. Using UTIS in early-stage retrieval to source better candidates and in late-stage ranking to fine-tune final scores provides benefits at each stage.
In retrieval, UTIS reconstructs true interest profiles and uses knowledge distillation to align retrieval models. In ranking, it provides an additional input feature to the value formula.
5
Weight survey responses to correct for sampling and nonresponse bias before using them as training data. Raw survey responses can be systematically biased by who chooses to respond, so statistical correction is essential for building a dataset that accurately reflects real user preferences.
This is especially important when surveys are deployed at scale across diverse user populations with varying response propensities.
6
Consider interest matching dimensions beyond simple topic alignment, including audio, production style, mood, and motivation. True user interest is multidimensional, and narrow topic-matching heuristics miss important factors that drive genuine satisfaction.
This broader view of interest matching helps surface high-quality niche content rather than defaulting to generic popularity-based recommendations.

Common Pitfalls

1
Relying solely on engagement signals like watch time, likes, and shares to train recommendation models. These implicit signals are noisy and only achieved 48.3% precision at identifying true user interests, meaning models trained exclusively on them tend to over-optimize for short-term engagement at the expense of genuine user satisfaction.
Supplementing with direct user feedback through surveys provides a more accurate signal of true interest, leading to better long-term retention and engagement.
2
Using raw survey responses without correcting for sampling and nonresponse bias. Not all users are equally likely to respond to surveys, so unweighted responses can create a systematically skewed training dataset that doesn't represent the full user population.
Apply statistical weighting to correct for these biases before using survey data for model training to ensure the dataset accurately reflects real user preferences.
3
Equating interest matching with simple topic alignment only. True user interest is multidimensional, encompassing factors like audio quality, production style, mood, and motivation—not just whether the topic matches a user's stated preferences.
Building features that capture these broader dimensions of content perception leads to recommendations that feel more genuinely relevant and personalized.
4
Applying the interest model only at a single stage of the ranking pipeline. Limiting UTIS to either retrieval or final ranking captures only partial benefits, as improvements in candidate sourcing and final scoring are complementary.
Deploy interest models at both early-stage retrieval (to source better candidates) and late-stage ranking (to fine-tune final ordering) for compounding improvements.

Related Concepts

Recommendation Systems
Multi-task Learning
Knowledge Distillation
User Satisfaction Modeling
Survey-based Feedback Systems
Ranking Model Architecture
Content Personalization
A/B Testing At Scale
Bias Correction In ML
Large Language Models For Recommendations
User Interest Profiling
Retrieval And Re-ranking Pipelines