Overview
The article discusses LinkedIn's new search architecture, Galene, which was developed to address the limitations of their previous search stack built on Lucene. It highlights the motivations for the redesign, the major design decisions made, and the benefits realized through the implementation of Galene.
What You'll Learn
1
How to redesign a search architecture to improve scalability and performance
2
Why separating infrastructure tasks from relevance tasks enhances development processes
3
How to implement live updates at the field granularity level in a search index
Prerequisites & Requirements
- Understanding of search engine architecture and indexing concepts
- Experience with distributed systems and search technologies(optional)
Key Questions Answered
What were the limitations of LinkedIn's pre-Galene search architecture?
The pre-Galene architecture faced challenges such as difficulty in rebuilding complete indices, inflexible scoring mechanisms, and a fragmented system due to numerous small open-sourced components. These limitations necessitated a complete redesign to improve scalability and relevance.
How does Galene improve the search experience for LinkedIn users?
Galene enhances the search experience by allowing for typeahead searches across all 300M+ members, improving relevance through sophisticated algorithms, and increasing performance by being over twice as fast while using a third of the hardware compared to the previous implementation.
What is the role of the Federator and Broker in the Galene architecture?
The Federator and Broker are services that accept queries, distribute them to multiple services, and combine the responses. They enable structured retrieval queries and enhance the search process by allowing for plugin-based query rewriting and merging.
What is the significance of early termination in Galene's search process?
Early termination allows the retrieval process to stop as soon as a sufficient number of relevant entities are found, improving performance by reducing the number of entities that need to be scored. This is achieved by assigning a static rank to entities during index building.
Key Statistics & Figures
Number of LinkedIn members searchable
300M+
Galene allows typeahead searches across all members, significantly improving the search capability.
Performance improvement factor of Galene over previous implementation
More than twice as fast
Galene achieves this while utilizing about a third of the hardware resources.
Technologies & Tools
Backend
Lucene
Used as the indexing layer in the Galene architecture.
Backend
Hadoop
Used for building the base index through map-reduce operations.
Key Actionable Insights
1Consider implementing a multi-segment indexing strategy to enhance search performance and manage live updates more efficiently.This approach allows for updates at the field level rather than the entity level, reducing overhead and improving system responsiveness.
2Utilize plugin architectures for query rewriting and scoring to enhance flexibility and adaptability in search systems.By allowing for various scoring algorithms and query enhancements, you can better meet diverse user needs and improve relevance.
3Adopt a more agile development process to facilitate rapid iteration and improvement of search algorithms.This can lead to more frequent updates and enhancements, ultimately resulting in a better user experience and satisfaction.
Common Pitfalls
1
Relying too heavily on a fragmented system with numerous small components can lead to maintenance challenges and inefficiencies.
This often results in difficulties in keeping systems working together, which can hinder performance and scalability.
2
Neglecting the importance of live updates at the entity level can lead to performance bottlenecks.
Updates that require modifying entire entities rather than just fields can significantly slow down the search system.
Related Concepts
Search Engine Architecture
Indexing Strategies
Distributed Systems
Relevance Algorithms