Reimagining LinkedIn’s search tech stack

Fedor Borisyuk

•

Fedor Borisyuk

•18 min read•advanced•

--

•View Original

Overview

The article discusses the transformation of LinkedIn's search technology stack through the integration of large language models (LLMs) to enhance semantic search capabilities. It highlights the challenges and innovations involved in deploying LLMs at scale, focusing on query understanding, embedding-based retrieval, and ranking processes.

What You'll Learn

1

How to implement embedding-based retrieval using LLMs

2

Why semantic search improves user experience on job platforms

3

How to measure relevance quality in search systems

Prerequisites & Requirements

Understanding of semantic search and LLMs
Familiarity with GPU-based search technologies(optional)

Key Questions Answered

How does LinkedIn's semantic search utilize large language models?

LinkedIn's semantic search leverages large language models to interpret natural language queries, enabling a more intuitive search experience. This approach allows the system to infer user intent and preferences, improving the accuracy of search results beyond simple keyword matching.

What techniques are used to ensure the efficiency of LLM inference?

To improve LLM inference efficiency, LinkedIn employs model pruning and context pruning. Model pruning removes redundant components from the model, while context pruning summarizes lengthy item descriptions, reducing input size and enhancing processing speed without sacrificing relevance quality.

What is the role of the LLM judge in measuring search relevance?

The LLM judge is crucial for assessing the relevance of query-document pairs on a large scale. It applies a five-point rating system aligned with product policies, generating labeled data that informs the retrieval and ranking systems to optimize search quality.

Key Statistics & Figures

Weighted Cohen’s Kappa Score

≥ 0.8

Indicates high agreement between product managers and LLM judges on relevance ratings.

Throughput of SLM with embedding compression

22000 items/sec/GPU

Demonstrates significant efficiency improvements in the search system.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Cuda

Used for embedding-based retrieval on GPUs.

Data Processing

Spark

Utilized in the offline workflow for generating embeddings.

Data Processing

Flink

Employed in the low-latency nearline system for real-time data processing.

Key Actionable Insights

1
Integrating LLMs into your search stack can significantly enhance user experience by providing more relevant results based on natural language queries.
This approach not only improves the accuracy of search results but also aligns with how users express their needs, making the search experience more intuitive.

2
Regularly measuring and refining the relevance of search results is essential for maintaining high-quality user engagement.
Using LLM judges to evaluate query-document pairs ensures that the search system adapts to changing user expectations and maintains a competitive edge.

Common Pitfalls

1

Failing to continuously measure the relevance of search results can lead to outdated and ineffective search experiences.

Without regular evaluations, the search system may not adapt to evolving user needs, resulting in decreased engagement and satisfaction.

Related Concepts

Semantic Search Techniques

Large Language Models In Search Applications

Relevance Measurement In Information Retrieval