How LinkedIn Uses SQL
62 engineering articles about SQL from LinkedIn's engineering team
Other LinkedIn Technologies
Other Companies Using SQL
Articles
Filter:
The article discusses the optimization of LinkedIn Sales Navigator’s search pipeline using Apache Spark, highlighting the transition from MapReduce to Spark and the resulting performance improvemen...
The article discusses how LinkedIn utilizes Hoptimator to enhance the ingestion process for Apache Pinot, a real-time distributed OLAP datastore.
Ryanne Dolan
9 min read
Has Summary
--
The article discusses ValiData, a scalable automated config-driven data validation tool used at LinkedIn to ensure the accuracy and consistency of large datasets.
The article discusses the open sourcing of OpenHouse, a control plane designed for managing tables in a data lakehouse.
Sumedh Sakdeo
9 min read
Includes Code
Has Summary
--
The article emphasizes the importance of representation and diversity in engineering, particularly in the context of developing AI technologies.
Sabry Tozin
6 min read
Has Summary
--
The article discusses LinkedIn's journey in evolving its professional community policies enforcement at scale, focusing on the development of its anti-abuse platform and account restriction systems.
Amit M.
17 min read
Has Summary
--
The article discusses the implementation of privacy-preserving analytics for individual posts on LinkedIn, focusing on how to provide useful insights to post authors while safeguarding viewer ident...
The article discusses Costwiz, a tool developed by LinkedIn to optimize cloud costs on Azure by monitoring resource utilization and providing actionable insights.
LinkedIn Engineering Team
17 min read
Has Summary
--
The article introduces OpenHouse, a control plane developed at LinkedIn for managing tables in open source data lakehouse deployments.
Sumedh Sakdeo
11 min read
Includes Code
Has Summary
--
The article discusses the development of Hoptimator, a declarative data pipeline orchestrator designed to streamline the creation of end-to-end data pipelines at LinkedIn.
Ryanne Dolan
10 min read
Includes Code
Has Summary
--
The article discusses Avery's transition from a military commander to a Trust & Safety manager at LinkedIn, highlighting his journey through military IT and the support he received from his team du...
LinkedIn Engineering Team
7 min read
Has Summary
--
The article discusses LinkedIn's adoption of GraphQL to enhance API development for integrations and partnerships, achieving a 90% reduction in development time.
The article discusses how LinkedIn utilizes Apache Pinot for real-time analytics on network flow data, emphasizing the importance of observability in their infrastructure.
LinkedIn Engineering Team
10 min read
Has Summary
--
The article discusses the importance of data governance in large organizations like LinkedIn, emphasizing the need for effective schema annotations and automation in managing vast datasets.
The article discusses the implementation of a load-balanced Brooklin Mirror Maker at LinkedIn, which efficiently replicates large-scale Kafka clusters.
vaibhav maheshwari
14 min read
Has Summary
--
The article discusses Opal, a system developed at LinkedIn to manage mutable datasets within a data lake.
The article discusses the implementation of near real-time personalization features at LinkedIn, focusing on how member actions can be leveraged to enhance recommendation systems without significan...
Rupesh Gupta
17 min read
Has Summary
--
The article discusses LinkedIn's migration to Azure Front Door (AFD) and the significant performance improvements achieved through this transition.
The article discusses DARWIN, LinkedIn's unified Data Science and Artificial Intelligence Workbench, designed to streamline the workflows of data scientists and AI engineers by centralizing various...
Varun S.
20 min read
Has Summary
--
This article discusses the evolution of LinkedIn's Daily Executive Dashboard (DED) from a simple dashboard to a robust enterprise-grade data pipeline.
The article discusses the implementation of keyword search functionality in LinkedIn Talent Insights (LTI) using Apache Pinot.
Siddharth Teotia
17 min read
Has Summary
--
This article discusses the challenges and solutions for estimating the cardinality of set intersections at scale using Apache Pinot and Theta Sketches.
The article discusses Coral, an open-sourced SQL translation, analysis, and rewrite engine developed at LinkedIn for modern data lakehouses.
The article introduces Magnet, a scalable and performant shuffle architecture designed for Apache Spark, addressing the challenges faced in shuffle operations at LinkedIn.
Min Shen
16 min read
Has Summary
--
The article discusses LIquid, a new graph database, focusing on its design and implementation.
Scott Meyer
15 min read
Has Summary
--
The article discusses the architectural improvements made to Jhubbub, LinkedIn's internal backend service for processing RSS feeds, by leveraging Apache Helix to create a stateless and elastic syst...
The article introduces LIquid, a new graph database developed by LinkedIn, designed to facilitate real-time querying of the economic graph.
Scott Meyer
14 min read
Has Summary
--
The article discusses the development of LinkedIn Talent Insights, a tool designed to democratize data-driven decision-making in talent management.
The article discusses Spark-TFRecord, a new data source for Apache Spark that aims to provide full support for the TFRecord data format used in TensorFlow.
Jun Shi
5 min read
Has Summary
--
Apache Pinot 0. 3. 0 is an open-source, distributed OLAP data store developed at LinkedIn, designed for near-real-time analytics.
Mayank S.
9 min read
Has Summary
--
The article discusses advanced schema management techniques for Apache Spark applications at LinkedIn, focusing on the integration of Avro schemas with the Hive Metastore to enhance type safety and...
The article discusses Data Sentinel, a platform developed at LinkedIn to automate data validation and improve data quality in production environments.
Arun Swami
9 min read
Has Summary
--
The article discusses the 'skills genome methodology' developed by LinkedIn to identify unique skills associated with emerging jobs, which are rapidly growing but may lack a large workforce.
Zhichun Jenny Ying
8 min read
Has Summary
--
This article provides an in-depth look at LinkedIn's data pipeline monitoring system, focusing on the challenges faced with traditional monitoring methods and how they have evolved to improve visib...
The article discusses the open-sourcing of Brooklin, a distributed service for near real-time data streaming at scale, which has been in production at LinkedIn since 2016.
The article discusses how Apache Calcite can bridge the gap between offline and nearline computations in big data processing.
Khai Tran
12 min read
Has Summary
--
The article discusses the Helix Task Framework, a component of Apache Helix designed for managing distributed tasks in large-scale data processing systems.
The article announces the release of Samza 1. 0, a distributed stream processing framework developed at LinkedIn, highlighting its significant features and improvements.
Jagadish Venkatraman
10 min read
Has Summary
--
The article discusses the essential elements of modern data science, particularly in the context of Big Data and AI.
Michael Li
8 min read
Has Summary
--
The article discusses the evolution of incremental data capture for Oracle databases at LinkedIn, highlighting the transition from a batch processing model to a near-real-time system.
The article discusses the Query Analyzer, a tool developed by LinkedIn for analyzing MySQL queries with minimal overhead.
Karthik Appigatla
10 min read
Has Summary
--
The article discusses the migration of LinkedIn's internal service, Babylonia, from Oracle to Espresso, a distributed NoSQL database.
The article discusses new analytics features on LinkedIn that allow users to see who has viewed their posts, enhancing their ability to understand audience engagement.
This article discusses the evolution of LinkedIn's Endorsements infrastructure, focusing on the integration of GraphDB to enhance the relevance of suggested endorsements.
Victor Kabdebon
6 min read
Has Summary
--
The article discusses the evolution of LinkedIn's Endorsements infrastructure, highlighting the need for a more effective system to provide valuable skill endorsements.
Victor Kabdebon
7 min read
Has Summary
--
This article discusses the challenges of data access in high-scale stream processing, particularly focusing on the read/write and read-only data access patterns.
The article discusses the challenges faced in stream processing, particularly focusing on the limitations of the Lambda architecture.
The article discusses FollowFeed, LinkedIn's new feed infrastructure designed to enhance performance and relevance for its users.
The article discusses the use of nested data in Hive and highlights various engineering insights from LinkedIn professionals.
Erran Berger
3 min read
Has Summary
--
The article discusses the development of LinkedIn Placements, a system designed to streamline the campus recruitment process for Indian universities.
Pradeep Hodigere
8 min read
Has Summary
--