Serving Top Comments in Professional Social Networks

Overview

The article discusses the implementation of a scalable comment ranking system on LinkedIn that utilizes machine learning to enhance user engagement by surfacing the most relevant comments. It outlines the challenges faced, the architecture of the system, and the performance improvements achieved since its deployment.

What You'll Learn

1

How to build a scalable comment ranking system using machine learning

2

Why pre-materializing features improves comment retrieval performance

3

How to leverage Apache Samza for real-time data processing

Prerequisites & Requirements

  • Understanding of machine learning concepts and algorithms
  • Familiarity with Apache Samza and data processing frameworks(optional)

Key Questions Answered

How does LinkedIn rank comments to improve user engagement?
LinkedIn ranks comments using a machine learning model that considers various features such as commenter reputation, comment content, and engagement metrics. This personalized ranking helps surface the most relevant comments for each user, enhancing their experience on the platform.
What challenges did LinkedIn face in implementing the comment ranking system?
LinkedIn faced scalability challenges related to latency and the need for real-time processing of comment features. Initial implementations resulted in high tail latency, prompting the development of a more efficient architecture that pre-computes features for faster retrieval.
What metrics indicate the success of the comment ranking system?
The system achieved a production tail latency of 15ms at the 50th percentile and 65ms at the 99th percentile. It also ranks twice as many comments in a quarter of the time compared to the original MVP, demonstrating significant performance improvements.

Key Statistics & Figures

Production tail latency at the 50th percentile
15ms
This metric indicates the responsiveness of the comment ranking system under typical load conditions.
Production tail latency at the 99th percentile
65ms
This metric reflects the system's performance under peak load, ensuring that even at high traffic, the system remains efficient.
Engagement increase on iOS
22%
This increase in comments viewed indicates the effectiveness of the new ranking system in enhancing user interaction.
Engagement increase on Android
14%
Similar to iOS, this metric shows the positive impact of the changes made to the comment ranking system on user engagement.

Technologies & Tools

Data Processing
Apache Samza
Used for real-time stream processing to compute comment features.
Machine Learning
Photon ML
Utilized for training the machine learning model that predicts viewer engagement.
Data Storage
Voldemort
Serves as an online store for querying viewer-specific features.

Key Actionable Insights

1
Implement a feature pre-materialization strategy to enhance data retrieval speed.
By pre-computing and storing features associated with comments, you can significantly reduce the latency involved in fetching relevant data, leading to a smoother user experience.
2
Utilize machine learning models to personalize user interactions.
Personalization can greatly improve engagement metrics. By training models on user behavior and preferences, you can tailor content delivery to individual users, making interactions more relevant.
3
Adopt a real-time data processing framework like Apache Samza for handling large volumes of data.
Real-time processing allows for immediate updates and responsiveness in applications, which is crucial for platforms with high user engagement like LinkedIn.

Common Pitfalls

1
Relying solely on aggregated likes for comment ranking can lead to bias.
Comments that receive early engagement may be unfairly prioritized, while high-quality comments that are posted later may be overlooked. A more nuanced approach that considers various features is essential.

Related Concepts

Machine Learning
Real-time Data Processing
User Engagement Metrics
Comment Ranking Systems