Building The LinkedIn Knowledge Graph

Qi He

•

Qi He

•12 min read•intermediate•

--

•View Original

ActionScriptHTMLJavaMachine LearningPHPVault

Overview

The article discusses the construction of the LinkedIn Knowledge Graph, detailing how it utilizes machine learning to create a dynamic knowledge base of professional entities. It highlights the challenges faced in building this graph and the methodologies employed to infer relationships and standardize data.

What You'll Learn

1

How to apply machine learning techniques for entity relationship inference in a knowledge graph

2

Why user-generated content is crucial for building a scalable knowledge graph

3

How to standardize data from multiple sources to enhance data quality

Prerequisites & Requirements

Understanding of machine learning concepts and data standardization techniques
Experience with knowledge graphs and data processing frameworks(optional)

Key Questions Answered

What is LinkedIn’s knowledge graph and how is it constructed?

LinkedIn’s knowledge graph is a large knowledge base built on entities such as members, jobs, and skills. It is constructed primarily from user-generated content and supplemented with external data, utilizing machine learning for data standardization and relationship inference.

How does LinkedIn infer relationships between entities?

LinkedIn infers relationships through a near real-time content processing framework that combines explicit relationships provided by users and inferred relationships predicted by machine learning models, ensuring a dynamic and updated knowledge graph.

What challenges does LinkedIn face in building its knowledge graph?

Challenges include managing the scale of data as new members and entities emerge, ensuring data quality from user-generated content, and maintaining real-time updates to the graph as profiles change.

What techniques are used for entity taxonomy construction?

Techniques include generating candidates from user profiles, disambiguating entities using clustering algorithms, and de-duplicating entities through word vector representations, ensuring a clean and accurate taxonomy.

Key Statistics & Figures

Number of members on LinkedIn

450M

This figure highlights the scale of data that the knowledge graph must manage.

Number of historical job listings

190M

This statistic demonstrates the extensive job-related data integrated into the knowledge graph.

Number of companies represented

9M

This number reflects the breadth of organizational data included in the graph.

Number of skills available

35K

These skills are categorized in 19 languages, showcasing the graph's multilingual capabilities.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Kafka

Used for high-throughput distributed messaging to deliver data from the knowledge graph.

Machine Learning

Word2vec

Employed for generating word vector representations to assist in entity disambiguation and de-duplication.

Key Actionable Insights

1
Utilize machine learning for data standardization to improve the quality of your knowledge graph.
This approach helps in cleaning up user-generated content and ensures that the data used in your applications is accurate and reliable.

2
Incorporate user feedback to refine entity relationships and improve model accuracy.
By actively seeking user input on inferred relationships, you can enhance the quality of your knowledge graph and adapt it to real-world usage.

3
Leverage external data sources to supplement your knowledge graph and fill in gaps.
External data can provide valuable context and additional attributes for entities, enhancing the overall richness of your knowledge graph.

Common Pitfalls

1

Relying solely on user-generated content without validation can lead to inaccuracies in the knowledge graph.

This occurs because users may input erroneous or incomplete information, which can propagate through the system if not properly checked.

Related Concepts

Machine Learning

Graph Systems

Data Science

We’ve reached a few big milestones for the Slack Bug Bounty program: it’s our three-year anniversary, and we’ve paid out more than $210,000 in bounties! We want to give a big thank you to all the security researchers who have helped make Slack more secure. In this post we’ll offer a retrospective on our bug…

TypeScriptJavaScriptJava

11 min read

Has Summary

--

Slack

Intermediate

Hacklang at Slack: A Better PHP

Slack launched in 2014 with a PHP 5 backend. Along with several other companies, we switched to HHVM in 2016 because it ran our PHP code faster. We stayed with HHVM because it offers an entirely new language: Hack (searchable as Hacklang). Hack makes our developers faster by improving productivity through better tooling. Hack began as a superset of PHP, retaining its best…

TypeScriptJavaScriptJava

10 min read

Includes Code

Has Summary

--

These articles from LinkedIn and other leading engineering teams share similar topics with "Building The LinkedIn Knowledge Graph". Explore more engineering insights on JavaScript, Java, TypeScript.