FishDB: a generic retrieval engine for scaling LinkedIn’s feed

Kenneth Li

•

Kenneth Li

•15 min read•advanced•

--

•View Original

ClaudeJavaRust

Overview

FishDB is a generic retrieval engine developed by LinkedIn to replace the legacy FollowFeed system, enhancing scalability and performance for their feed infrastructure. It is built in Rust and achieves significant efficiency improvements, including a 2x increase in processing efficiency and a 50% reduction in hardware usage.

What You'll Learn

1

How to implement a generic retrieval engine using Rust

2

Why to choose Rust over Java for performance-critical applications

3

How to design a scatter-gather architecture for data retrieval

4

When to use a lambda architecture for data ingestion

Prerequisites & Requirements

Understanding of retrieval systems and data architecture
Familiarity with Rust programming language(optional)

Key Questions Answered

What are the main limitations of LinkedIn's previous FollowFeed system?

The FollowFeed system faced scalability bottlenecks due to memory inefficiency, content duplication, and tail latency issues. It also had usability constraints like a rigid data model and tightly coupled business logic, which hindered flexibility and speed of feature rollouts.

How does FishDB improve performance compared to FollowFeed?

FishDB achieves 2x efficiency and reduces hardware usage by 50% compared to FollowFeed. It utilizes Rust for better memory management and offers more flexible APIs while maintaining strict latency SLOs, allowing for faster data retrieval and processing.

What architectural pattern does FishDB use for data retrieval?

FishDB employs a scatter-gather architecture, where requests are distributed across multiple partitioned shards, and the results are aggregated to provide the top results back to the caller. This design enhances scalability and performance.

What is the role of the inverted index in FishDB?

The inverted index in FishDB is an in-memory hashmap that maps terms to lists of document IDs. This structure allows for efficient querying and retrieval of documents based on indexed terms, optimizing search capabilities.

Key Statistics & Figures

Efficiency improvement

2x

FishDB achieves this efficiency compared to the legacy FollowFeed system.

Reduction in hardware usage

50%

This reduction is achieved while maintaining performance targets.

Latency target

40ms p99

FishDB maintains this latency target while supporting increased queries per second.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Programming Language

Rust

Used to build FishDB for improved performance and memory management.

Stream Processing

Kafka

Used for real-time data ingestion into FishDB.

Database

Rocksdb

Utilized for key-value attribute stores to support larger volumes of data.

Key Actionable Insights

1
Implementing FishDB can significantly enhance the performance of data retrieval systems, especially in environments with high scalability demands.
By leveraging Rust's memory management capabilities, FishDB reduces overhead and improves efficiency, making it suitable for modern applications that require quick access to large datasets.

2
Adopting a scatter-gather architecture can streamline data processing and retrieval, allowing for better resource utilization.
This architecture is particularly effective in distributed systems, where data can be processed in parallel, reducing latency and improving response times.

3
Utilizing a lambda architecture for data ingestion can provide flexibility in handling both batch and real-time data.
This approach allows systems to remain responsive while ensuring that data is consistently updated and available for querying.

Common Pitfalls

1

Over-reliance on Java-based systems can lead to performance bottlenecks due to garbage collection and memory inefficiencies.

This can be avoided by transitioning to more efficient languages like Rust that offer better memory management capabilities.

2

Rigid data models can hinder the evolution of applications and lead to technical debt.

It's essential to design flexible data models that can adapt to changing requirements without significant overhead.

Related Concepts

Data Retrieval Systems

Lambda Architecture

Scatter-gather Architecture

Inverted Indexing