LinkedIn logo

How LinkedIn Uses SQL

62 engineering articles about SQL from LinkedIn's engineering team

Articles

Filter:
LinkedIn logo
LinkedIn
Advanced
The article discusses the optimization of LinkedIn Sales Navigator’s search pipeline using Apache Spark, highlighting the transition from MapReduce to Spark and the resulting performance improvemen...
Chunxu Tang
14 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses how LinkedIn utilizes Hoptimator to enhance the ingestion process for Apache Pinot, a real-time distributed OLAP datastore.
Ryanne Dolan
9 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses ValiData, a scalable automated config-driven data validation tool used at LinkedIn to ensure the accuracy and consistency of large datasets.
Bharadwaj Jayaraman
15 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the open sourcing of OpenHouse, a control plane designed for managing tables in a data lakehouse.
LinkedIn logo
LinkedIn
Advanced
The article emphasizes the importance of representation and diversity in engineering, particularly in the context of developing AI technologies.
Sabry Tozin
6 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses LinkedIn's journey in evolving its professional community policies enforcement at scale, focusing on the development of its anti-abuse platform and account restriction systems.
Amit M.
17 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the implementation of privacy-preserving analytics for individual posts on LinkedIn, focusing on how to provide useful insights to post authors while safeguarding viewer ident...
Ryan Rogers
24 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses Costwiz, a tool developed by LinkedIn to optimize cloud costs on Azure by monitoring resource utilization and providing actionable insights.
LinkedIn logo
LinkedIn
Intermediate
The article introduces OpenHouse, a control plane developed at LinkedIn for managing tables in open source data lakehouse deployments.
Sumedh Sakdeo
11 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the development of Hoptimator, a declarative data pipeline orchestrator designed to streamline the creation of end-to-end data pipelines at LinkedIn.
Ryanne Dolan
10 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses Avery's transition from a military commander to a Trust & Safety manager at LinkedIn, highlighting his journey through military IT and the support he received from his team du...
LinkedIn Engineering Team
7 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses LinkedIn's adoption of GraphQL to enhance API development for integrations and partnerships, achieving a 90% reduction in development time.
LinkedIn Engineering Team
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses how LinkedIn utilizes Apache Pinot for real-time analytics on network flow data, emphasizing the importance of observability in their infrastructure.
LinkedIn Engineering Team
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
The article discusses the importance of data governance in large organizations like LinkedIn, emphasizing the need for effective schema annotations and automation in managing vast datasets.
Joshua Shinavier
8 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the implementation of a load-balanced Brooklin Mirror Maker at LinkedIn, which efficiently replicates large-scale Kafka clusters.
vaibhav maheshwari
14 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses Opal, a system developed at LinkedIn to manage mutable datasets within a data lake.
Bhupendra Jain
16 min read
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
The article discusses the implementation of near real-time personalization features at LinkedIn, focusing on how member actions can be leveraged to enhance recommendation systems without significan...
Rupesh Gupta
17 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses LinkedIn's migration to Azure Front Door (AFD) and the significant performance improvements achieved through this transition.
Samir Jafferali
14 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses DARWIN, LinkedIn's unified Data Science and Artificial Intelligence Workbench, designed to streamline the workflows of data scientists and AI engineers by centralizing various...
LinkedIn logo
LinkedIn
Intermediate
This article discusses the evolution of LinkedIn's Daily Executive Dashboard (DED) from a simple dashboard to a robust enterprise-grade data pipeline.
Jennifer Zheng
16 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the implementation of keyword search functionality in LinkedIn Talent Insights (LTI) using Apache Pinot.
LinkedIn logo
LinkedIn
Advanced
This article discusses the challenges and solutions for estimating the cardinality of set intersections at scale using Apache Pinot and Theta Sketches.
Vincent Wang
13 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses Coral, an open-sourced SQL translation, analysis, and rewrite engine developed at LinkedIn for modern data lakehouses.
Walaa Eldin Moustafa
20 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article introduces Magnet, a scalable and performant shuffle architecture designed for Apache Spark, addressing the challenges faced in shuffle operations at LinkedIn.
Min Shen
16 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses LIquid, a new graph database, focusing on its design and implementation.
Scott Meyer
15 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the architectural improvements made to Jhubbub, LinkedIn's internal backend service for processing RSS feeds, by leveraging Apache Helix to create a stateless and elastic syst...
Hunter Lee
11 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article introduces LIquid, a new graph database developed by LinkedIn, designed to facilitate real-time querying of the economic graph.
Scott Meyer
14 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the development of LinkedIn Talent Insights, a tool designed to democratize data-driven decision-making in talent management.
Tim Santos
12 min read
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
The article discusses Spark-TFRecord, a new data source for Apache Spark that aims to provide full support for the TFRecord data format used in TensorFlow.
LinkedIn logo
LinkedIn
Advanced
Apache Pinot 0. 3. 0 is an open-source, distributed OLAP data store developed at LinkedIn, designed for near-real-time analytics.
LinkedIn logo
LinkedIn
Intermediate
The article discusses advanced schema management techniques for Apache Spark applications at LinkedIn, focusing on the integration of Avro schemas with the Hive Metastore to enhance type safety and...
Walaa Eldin Moustafa
14 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses Data Sentinel, a platform developed at LinkedIn to automate data validation and improve data quality in production environments.
Arun Swami
9 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the 'skills genome methodology' developed by LinkedIn to identify unique skills associated with emerging jobs, which are rapidly growing but may lack a large workforce.
LinkedIn logo
LinkedIn
Intermediate
This article provides an in-depth look at LinkedIn's data pipeline monitoring system, focusing on the challenges faced with traditional monitoring methods and how they have evolved to improve visib...
Krishnan Raman
16 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the open-sourcing of Brooklin, a distributed service for near real-time data streaming at scale, which has been in production at LinkedIn since 2016.
Celia K.
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses how Apache Calcite can bridge the gap between offline and nearline computations in big data processing.
Khai Tran
12 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the Helix Task Framework, a component of Apache Helix designed for managing distributed tasks in large-scale data processing systems.
Junkai Xue
15 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article announces the release of Samza 1. 0, a distributed stream processing framework developed at LinkedIn, highlighting its significant features and improvements.
Jagadish Venkatraman
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the essential elements of modern data science, particularly in the context of Big Data and AI.
Michael Li
8 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the evolution of incremental data capture for Oracle databases at LinkedIn, highlighting the transition from a batch processing model to a near-real-time system.
Saurabh Goyal
9 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the Query Analyzer, a tool developed by LinkedIn for analyzing MySQL queries with minimal overhead.
Karthik Appigatla
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the migration of LinkedIn's internal service, Babylonia, from Oracle to Espresso, a distributed NoSQL database.
David Max
11 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses new analytics features on LinkedIn that allow users to see who has viewed their posts, enhancing their ability to understand audience engagement.
Andranik Kurghinyan
7 min read
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
This article discusses the evolution of LinkedIn's Endorsements infrastructure, focusing on the integration of GraphDB to enhance the relevance of suggested endorsements.
Victor Kabdebon
6 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the evolution of LinkedIn's Endorsements infrastructure, highlighting the need for a more effective system to provide valuable skill endorsements.
Victor Kabdebon
7 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
This article discusses the challenges of data access in high-scale stream processing, particularly focusing on the read/write and read-only data access patterns.
LinkedIn Engineering Team
21 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses the challenges faced in stream processing, particularly focusing on the limitations of the Lambda architecture.
LinkedIn Engineering Team
14 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses FollowFeed, LinkedIn's new feed infrastructure designed to enhance performance and relevance for its users.
Ankit Gupta
25 min read
Has Summary
--
LinkedIn logo
LinkedIn
Beginner
The article discusses the use of nested data in Hive and highlights various engineering insights from LinkedIn professionals.
Erran Berger
3 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses the development of LinkedIn Placements, a system designed to streamline the campus recruitment process for Indian universities.
Pradeep Hodigere
8 min read
Has Summary
--