Adapting the Facebook Reels RecSys AI Model Based on User Feedback

Senthil Rajagopalan

We’ve improved personalized video recommendations on Facebook Reels by moving beyond metrics such as likes and watch time and directly leveraging user feedback. Our new User True Interest Survey (…

Overview

Meta describes their User True Interest Survey (UTIS) model for Facebook Reels, which moves beyond traditional engagement metrics like likes and watch time to directly leverage user feedback via in-app surveys. The approach uses a lightweight alignment model trained on survey responses to predict user satisfaction, resulting in significant improvements in recommendation accuracy, engagement, and retention across large-scale A/B tests with over 10 million users.

What You'll Learn

1

Why traditional engagement signals like likes and watch time fail to capture true user interests in recommendation systems

2

How to design and deploy large-scale in-product user surveys to collect direct interest feedback

3

How to build a lightweight alignment model layer on top of an existing ranking system using survey responses

4

How to integrate a user interest model at both early-stage retrieval and late-stage ranking in a recommendation funnel

5

How to evaluate recommendation quality improvements using both offline metrics and large-scale A/B testing

Prerequisites & Requirements

Understanding of recommendation systems and ranking models (multi-task, multi-label architectures)
Familiarity with machine learning concepts including knowledge distillation, binary classification, and A/B testing
Understanding of information retrieval metrics such as precision, recall, and accuracy
Experience with large-scale ML systems and ranking pipelines(optional)

Key Questions Answered

Why are engagement signals like likes and watch time insufficient for recommendation systems?

Engagement signals such as likes, shares, and watch time are noisy and don't fully capture what users actually care about. Models trained only on these signals tend to recommend content with high short-term value but miss true interests important for long-term utility. Effective interest matching encompasses factors beyond topic alignment, including audio, production style, mood, and motivation.

How does Meta's UTIS model work to improve Facebook Reels recommendations?

The UTIS model is a lightweight alignment layer trained on binarized user survey responses that asks 'How well does this video match your interests?' on a 1-5 scale. It uses existing predictions from the main multi-task ranking model as input features, along with engineered features capturing user behavior, content attributes, and interest signals, to output the probability that a user is satisfied with a video.

How accurate were previous heuristic methods at identifying user interests compared to UTIS?

Previous interest heuristics only achieved 48.3% precision in identifying true interests. The UTIS model improved accuracy from 59.5% to 71.5%, precision from 48.3% to 63.2%, and recall from 45.4% to 66.1%, demonstrating substantially better ability to identify users' actual interest preferences compared to the heuristic baseline.

How is the UTIS model integrated into Facebook Reels' ranking pipeline?

UTIS is deployed at two stages: In Late Stage Ranking (LSR), it runs in parallel to the LSR model providing an additional input feature to the final value formula. In Early Stage Ranking (Retrieval), it reconstructs users' true interest profiles by aggregating survey data and uses knowledge distillation to align user-to-item retrieval models using UTIS predictions as labels.

What were the real-world results of deploying the UTIS model in Facebook Reels?

Large-scale A/B testing with over 10 million users showed a +5.4% increase in high survey ratings, -6.84% reduction in low survey ratings, +5.2% boost in total user engagement, and -0.34% decrease in integrity violations. The model also increased delivery of high-quality niche content and reduced low-quality generic popularity-based recommendations.

How does Meta collect user perception data for training the UTIS model?

Meta deploys large-scale randomized surveys within the video feed, where a proportion of users are randomly chosen to see a single-question survey asking 'To what extent does this video match your interests?' on a 1-5 scale. Responses are weighted to correct for sampling and nonresponse bias, collecting thousands of in-context responses daily across Facebook Reels and other video surfaces.

What challenges remain for survey-driven recommendation models?

Key remaining challenges include better serving users with sparse engagement histories, reducing bias in survey sampling and delivery, further personalizing recommendations for diverse user cohorts, and improving recommendation diversity. Meta is exploring advanced modeling techniques including large language models and more granular user representations to address these issues.

Key Statistics & Figures

Previous heuristic precision for identifying true interests

48.3%

Baseline performance before UTIS model

UTIS model accuracy improvement

59.5% to 71.5%

Offline accuracy improvement over heuristic baseline

UTIS model precision improvement

48.3% to 63.2%

Offline precision improvement over heuristic baseline

UTIS model recall improvement

45.4% to 66.1%

Offline recall improvement over heuristic baseline

A/B test user population

10 million+ users

Large-scale online A/B testing

Increase in high survey ratings

+5.4%

Online A/B test result

Reduction in low survey ratings

-6.84%

Online A/B test result

Boost in total user engagement

+5.2%

Online A/B test result

Decrease in integrity violations

-0.34%

Online A/B test result

Technologies & Tools

AI/ML

Machine Learning

Multi-task, multi-label ranking model and lightweight UTIS alignment layer

AI/ML

Knowledge Distillation

Aligning user-to-item retrieval models using UTIS predictions as labels

Experimentation

A/B Testing

Large-scale online evaluation with 10 million+ users

Platform

Facebook Reels

Primary video recommendation surface for UTIS deployment

Key Actionable Insights

1
Supplement implicit engagement signals with direct user feedback surveys to capture true interest. Engagement metrics like watch time and likes are noisy proxies that only achieved 48.3% precision at identifying true user interests, meaning over half of inferred interests were wrong.
This is particularly impactful for recommendation systems where short-term engagement optimization diverges from long-term user satisfaction and retention.

2
Design a lightweight alignment model that sits on top of your existing ranking system rather than rebuilding from scratch. The UTIS model uses existing main model predictions as input features, making it efficient to train on sparse survey data while leveraging the full power of the existing ranking infrastructure.
This 'perception layer' architecture enables rapid iteration on user feedback signals without disrupting the core ranking model that handles engagement prediction.

3
Binarize survey responses to simplify modeling and reduce noise variance. Rather than predicting the exact 1-5 scale rating, converting to binary interest/no-interest labels makes the model more robust and easier to train on sparse survey data.
Survey data is inherently sparse since only a small proportion of viewing sessions include surveys, so denoising strategies are critical for model quality.

4
Apply user interest models at multiple stages of the recommendation funnel—both retrieval and final ranking—for compounding improvements. Using UTIS in early-stage retrieval to source better candidates and in late-stage ranking to fine-tune final scores provides benefits at each stage.
In retrieval, UTIS reconstructs true interest profiles and uses knowledge distillation to align retrieval models. In ranking, it provides an additional input feature to the value formula.

5
Weight survey responses to correct for sampling and nonresponse bias before using them as training data. Raw survey responses can be systematically biased by who chooses to respond, so statistical correction is essential for building a dataset that accurately reflects real user preferences.
This is especially important when surveys are deployed at scale across diverse user populations with varying response propensities.

6
Consider interest matching dimensions beyond simple topic alignment, including audio, production style, mood, and motivation. True user interest is multidimensional, and narrow topic-matching heuristics miss important factors that drive genuine satisfaction.
This broader view of interest matching helps surface high-quality niche content rather than defaulting to generic popularity-based recommendations.

Common Pitfalls

1

Relying solely on engagement signals like watch time, likes, and shares to train recommendation models. These implicit signals are noisy and only achieved 48.3% precision at identifying true user interests, meaning models trained exclusively on them tend to over-optimize for short-term engagement at the expense of genuine user satisfaction.

Supplementing with direct user feedback through surveys provides a more accurate signal of true interest, leading to better long-term retention and engagement.

2

Using raw survey responses without correcting for sampling and nonresponse bias. Not all users are equally likely to respond to surveys, so unweighted responses can create a systematically skewed training dataset that doesn't represent the full user population.

Apply statistical weighting to correct for these biases before using survey data for model training to ensure the dataset accurately reflects real user preferences.

3

Equating interest matching with simple topic alignment only. True user interest is multidimensional, encompassing factors like audio quality, production style, mood, and motivation—not just whether the topic matches a user's stated preferences.

Building features that capture these broader dimensions of content perception leads to recommendations that feel more genuinely relevant and personalized.

4

Applying the interest model only at a single stage of the ranking pipeline. Limiting UTIS to either retrieval or final ranking captures only partial benefits, as improvements in candidate sourcing and final scoring are complementary.

Deploy interest models at both early-stage retrieval (to source better candidates) and late-stage ranking (to fine-tune final ordering) for compounding improvements.

Related Concepts

Recommendation Systems

Multi-task Learning

Knowledge Distillation

User Satisfaction Modeling

Survey-based Feedback Systems

Ranking Model Architecture

Content Personalization

A/B Testing At Scale

Bias Correction In ML

Large Language Models For Recommendations

User Interest Profiling

Retrieval And Re-ranking Pipelines