Overview
FishDB is a generic retrieval engine developed by LinkedIn to replace the legacy FollowFeed system, enhancing scalability and performance for their feed infrastructure. It is built in Rust and achieves significant efficiency improvements, including a 2x increase in processing efficiency and a 50% reduction in hardware usage.
What You'll Learn
1
How to implement a generic retrieval engine using Rust
2
Why to choose Rust over Java for performance-critical applications
3
How to design a scatter-gather architecture for data retrieval
4
When to use a lambda architecture for data ingestion
Prerequisites & Requirements
- Understanding of retrieval systems and data architecture
- Familiarity with Rust programming language(optional)
Key Questions Answered
What are the main limitations of LinkedIn's previous FollowFeed system?
The FollowFeed system faced scalability bottlenecks due to memory inefficiency, content duplication, and tail latency issues. It also had usability constraints like a rigid data model and tightly coupled business logic, which hindered flexibility and speed of feature rollouts.
How does FishDB improve performance compared to FollowFeed?
FishDB achieves 2x efficiency and reduces hardware usage by 50% compared to FollowFeed. It utilizes Rust for better memory management and offers more flexible APIs while maintaining strict latency SLOs, allowing for faster data retrieval and processing.
What architectural pattern does FishDB use for data retrieval?
FishDB employs a scatter-gather architecture, where requests are distributed across multiple partitioned shards, and the results are aggregated to provide the top results back to the caller. This design enhances scalability and performance.
What is the role of the inverted index in FishDB?
The inverted index in FishDB is an in-memory hashmap that maps terms to lists of document IDs. This structure allows for efficient querying and retrieval of documents based on indexed terms, optimizing search capabilities.
Key Statistics & Figures
Efficiency improvement
2x
FishDB achieves this efficiency compared to the legacy FollowFeed system.
Reduction in hardware usage
50%
This reduction is achieved while maintaining performance targets.
Latency target
40ms p99
FishDB maintains this latency target while supporting increased queries per second.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Programming Language
Rust
Used to build FishDB for improved performance and memory management.
Stream Processing
Kafka
Used for real-time data ingestion into FishDB.
Database
Rocksdb
Utilized for key-value attribute stores to support larger volumes of data.
Key Actionable Insights
1Implementing FishDB can significantly enhance the performance of data retrieval systems, especially in environments with high scalability demands.By leveraging Rust's memory management capabilities, FishDB reduces overhead and improves efficiency, making it suitable for modern applications that require quick access to large datasets.
2Adopting a scatter-gather architecture can streamline data processing and retrieval, allowing for better resource utilization.This architecture is particularly effective in distributed systems, where data can be processed in parallel, reducing latency and improving response times.
3Utilizing a lambda architecture for data ingestion can provide flexibility in handling both batch and real-time data.This approach allows systems to remain responsive while ensuring that data is consistently updated and available for querying.
Common Pitfalls
1
Over-reliance on Java-based systems can lead to performance bottlenecks due to garbage collection and memory inefficiencies.
This can be avoided by transitioning to more efficient languages like Rust that offer better memory management capabilities.
2
Rigid data models can hinder the evolution of applications and lead to technical debt.
It's essential to design flexible data models that can adapt to changing requirements without significant overhead.
Related Concepts
Data Retrieval Systems
Lambda Architecture
Scatter-gather Architecture
Inverted Indexing