JARVIS: Helping LinkedIn Navigate its Source Code

Rajeev Kumar

•

Rajeev Kumar

•16 min read•intermediate•

--

•View Original

AvroJavaJavaScriptPythonRubyScalaSpring

Overview

The article discusses JARVIS, a search system developed by LinkedIn to enhance the navigation of its source code. It outlines the design, implementation, and challenges faced in creating an intelligent search tool that integrates with various clients and improves developer productivity.

What You'll Learn

1

How to implement an intelligent search system for source code

2

Why metadata extraction is crucial for efficient code search

3

When to use Hadoop for scaling metadata extraction processes

4

How to integrate a search system with IDE and CLI

Prerequisites & Requirements

Understanding of search systems and indexing concepts
Familiarity with Hadoop and Galene(optional)

Key Questions Answered

How does JARVIS improve code search efficiency at LinkedIn?

JARVIS enhances code search efficiency by implementing intelligent search capabilities that allow engineers to find relevant code faster. It utilizes metadata extraction, reference resolution, and a robust indexing system to ensure that search results are relevant and quickly accessible, significantly reducing the time spent searching through the codebase.

What challenges did LinkedIn face while developing JARVIS?

LinkedIn faced challenges related to the speed of metadata extraction and the complexity of reference resolution. Initially, metadata extraction was slow when performed on the same machine as code crawling, prompting a shift to Hadoop for parallel processing, which significantly improved performance and scalability.

What technologies are used in the JARVIS search system?

JARVIS utilizes technologies such as Hadoop for metadata extraction and Galene for managing search clusters. It also employs various document analyzers like ANTLR for Java files and Pygments for other programming languages, ensuring comprehensive support for diverse codebases.

How does JARVIS handle query relevance?

JARVIS assigns relevance scores to search results based on multiple features, including match info, importance, query interpretation, and file size. This scoring system ensures that the most relevant results appear at the top, enhancing the user experience during code searches.

Key Statistics & Figures

Base index build time

Less than 2.5 hours

This was achieved by parallelizing metadata extraction on Hadoop, allowing for faster indexing of LinkedIn's extensive codebase.

Technologies & Tools

Backend

Hadoop

Used for scaling metadata extraction processes.

Backend

Galene

Platform for managing search clusters and controlling indexing, retrieval, and relevance.

Tools

Antlr

Used for analyzing Java files during metadata extraction.

Tools

Pygments

Used for analyzing Python, Ruby, Scala, and JavaScript files.

Key Actionable Insights

1
Implementing a robust metadata extraction process can significantly enhance the performance of search systems.
By moving metadata extraction to a distributed system like Hadoop, LinkedIn was able to scale their processes and reduce the time taken to build the index, which is crucial for maintaining an efficient search experience.

2
Integrating search capabilities with IDEs and CLIs can improve developer productivity.
Providing a seamless search experience across different platforms allows engineers to quickly access relevant code without switching contexts, which is essential in a fast-paced development environment.

3
Utilizing advanced query features can lead to more precise search results.
By supporting complex queries and relevance ranking, JARVIS allows users to refine their searches effectively, which is particularly beneficial in large codebases with numerous dependencies.

Common Pitfalls

1

Relying on a single machine for metadata extraction can lead to performance bottlenecks.

This limitation can slow down the indexing process and hinder the ability to handle complex extraction tasks. Transitioning to a distributed system like Hadoop can alleviate these issues and improve overall efficiency.

Related Concepts

Search System Design

Metadata Extraction Techniques

Indexing Strategies

Relevance Ranking In Search Engines