Scaling Contextual Conversation Suggestions Over 500 Million Members

Haiyang Liu

•

Haiyang Liu

•13 min read•intermediate•

--

•View Original

ApacheCassandraIrisMachine Learning

Overview

The article discusses the engineering challenges and solutions involved in scaling contextual conversation suggestions for LinkedIn's messaging platform, which serves over 500 million members. It details the use of the Economic Graph to generate recommendations and the iterative approach taken to optimize performance, liquidity, and latency.

What You'll Learn

1

How to leverage graph search problems to enhance messaging systems

2

Why optimizing for latency is critical in user-facing applications

3

When to use hybrid solutions for online recommendation systems

Prerequisites & Requirements

Understanding of graph theory and recommendation systems
Experience with large-scale data processing frameworks like Hadoop(optional)

Key Questions Answered

How does LinkedIn generate contextual conversation suggestions?

LinkedIn generates contextual conversation suggestions using its Economic Graph, which represents members and companies as nodes and their relationships as weighted edges. The system recommends connections based on the strength of these relationships, allowing users to engage meaningfully with their network.

What challenges did LinkedIn face while scaling its recommendation system?

LinkedIn faced challenges related to liquidity, cost to serve (C2S), and latency. Initially, they achieved only 20% liquidity with high storage costs and unacceptable latency. The article discusses how they iteratively improved these metrics through various engineering solutions.

What was the impact of using indirect connections on recommendation liquidity?

By including indirect connections in their recommendations, LinkedIn increased liquidity from 30% to 70%. This approach allowed for a broader set of recommendations, enhancing user engagement and connection opportunities.

How did LinkedIn optimize the performance of its recommendation system?

LinkedIn optimized performance by pre-computing affinity scores for member-company pairs and using a hybrid approach that combined offline and online computations. This reduced latency and improved the overall user experience.

Key Statistics & Figures

Initial liquidity achieved

20%

This was the liquidity rate before optimizing the recommendation system.

Final latency achieved

460ms

This was the 99th percentile latency after implementing offline computations for affinity scores.

Total members served

500 million

This is the scale at which LinkedIn operates its messaging platform.

Technologies & Tools

Data Processing

Hadoop

Used for offline computation of recommendations and affinity scores.

Backend Service

Graph Service

Facilitates graph search queries over the Economic Graph to compute recommendations.

Key Actionable Insights

1
Implementing a hybrid solution can significantly enhance the performance of recommendation systems.
By pre-computing certain data offline, you can reduce the load on real-time systems, improving response times and user satisfaction.

2
Regularly assess the liquidity of your recommendations to ensure user engagement.
If liquidity is low, consider expanding the criteria for recommendations to include indirect connections, which can increase the number of relevant suggestions.

3
Focus on optimizing latency to improve user experience in real-time applications.
Users expect quick responses; thus, ensuring that your system can deliver recommendations within acceptable timeframes is crucial for maintaining engagement.

Common Pitfalls

1

Relying too heavily on massive joins in data processing can lead to performance bottlenecks.

This occurs because large joins can create data skewness, resulting in long processing times. To avoid this, consider breaking down joins into smaller, more manageable operations.

Related Concepts

Graph Theory

Recommendation Systems

Data Processing Frameworks