Building a Large-Scale Recommendation System: People You May Know

Parag Agrawal

•

Parag Agrawal

•7 min read•advanced•

--

•View Original

XGBoost

Overview

The article discusses the development of LinkedIn's 'People You May Know' (PYMK) recommendation system, detailing its architecture and the challenges faced in scaling its scoring mechanism to handle billions of items. It emphasizes the multi-stage ranking system designed to ensure high relevance and low latency in recommendations.

What You'll Learn

1

How to implement a multi-stage ranking system for recommendation algorithms

2

Why Recall@k is important for candidate selection in recommendation systems

3

How to use A/B testing to evaluate recommendation system performance

Prerequisites & Requirements

Understanding of recommendation systems and ranking algorithms
Experience with machine learning and data processing(optional)

Key Questions Answered

What is the purpose of the L0 Ranking stage in PYMK?

The L0 Ranking stage's main purpose is to select a few thousand candidates from an inventory of billions of items, focusing on ensuring the most relevant candidates are included rather than ranking them. This stage uses Recall@k as the evaluation metric.

How does the L2 Ranking stage improve candidate recommendations?

The L2 Ranking stage uses multiple heavy models, typically deep neural networks, to predict the probability and value of engagement events like invitations sent and accepted. It employs high-precision metrics such as AUC and Precision@k to ensure the most relevant candidates are ranked effectively.

What challenges are associated with online evaluation of the PYMK system?

Online evaluation faces challenges due to discrepancies between offline training data and online data, presentation biases, and potential errors in model deployment. A/B testing is relied upon for accurate performance assessment.

What techniques are used in the Re-Ranker stage?

The Re-Ranker stage employs various models, including fairness and diversity re-rankers, to ensure balanced recommendations. It utilizes Bayesian optimization to estimate important parameters for optimizing multiple objectives in candidate selection.

Key Statistics & Figures

Data processed daily

hundreds of terabytes

This volume of data is essential for generating recommendations for hundreds of billions of potential connections.

Candidate pool size in L0 Ranking

thousands from billions

The L0 Ranking stage is designed to filter down from billions of items to a manageable few thousand candidates.

k value range in L1 Ranking

500-800

This range is used for evaluating the effectiveness of the candidates selected from the L0 Ranking stage.

Technologies & Tools

Machine Learning

Xgboost

Used as a lightweight model for calibration in the L1 Ranking stage.

Machine Learning

Deep Neural Networks

Employed in the L2 Ranking stage to predict engagement events.

Optimization

Bayesian Optimization

Utilized in the Re-Ranker stage to estimate important parameters for candidate selection.

Key Actionable Insights

1
Implement a multi-stage ranking system to enhance recommendation accuracy.
By structuring the ranking process into stages, you can effectively reduce the candidate pool while ensuring that the most relevant options are prioritized, leading to better user engagement.

2
Utilize A/B testing to validate the effectiveness of your recommendation algorithms.
A/B testing allows you to compare different versions of your recommendation system in real-time, providing insights into user interactions and helping to fine-tune algorithms based on actual performance.

3
Incorporate fairness and diversity considerations in your ranking algorithms.
Ensuring that your recommendations are fair and diverse can improve user satisfaction and engagement, as it addresses potential biases and promotes a wider range of connections.

Common Pitfalls

1

Failing to account for discrepancies between offline training data and online data can lead to inaccurate performance evaluations.

This often occurs because the distribution of data used for training may not reflect real-world usage, which can skew results and lead to suboptimal recommendations.

2

Neglecting the importance of fairness and diversity in recommendations can result in biased outcomes.

Without addressing these factors, the recommendation system may favor certain groups over others, diminishing user experience and engagement.

Related Concepts

Recommendation Systems

Machine Learning Algorithms

A/B Testing

Data Processing Techniques