How LinkedIn Uses Apache Kafka

48 engineering articles about Apache Kafka from LinkedIn's engineering team

Other Companies Using Apache Kafka

Articles

Filter:

Intermediate

Powering Apache Pinot ingestion with Hoptimator

The article discusses how LinkedIn utilizes Hoptimator to enhance the ingestion process for Apache Pinot, a real-time distributed OLAP datastore.

ApacheApache KafkaSQL

Ryanne Dolan

9 min read

Has Summary

Advanced

Improving Recruiting Efficiency with a Hybrid Bulk Data Processing Framework

The article discusses a hybrid bulk data processing framework developed to improve recruiting efficiency during data ownership transfers, particularly in the context of company mergers and recruite...

ApacheApache KafkaKubernetes

Aditya Hegde

12 min read

Includes Code

Has Summary

Advanced

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

The article discusses LinkedIn's innovative use of Apache Beam for real-time streaming processing, handling over 4 trillion events daily across more than 3,000 pipelines.

ApacheApache KafkaApache SparkgRPCJavaPython

Bingfeng Xia

16 min read

Has Summary

Intermediate

Scaling Salt for Remote Execution to support LinkedIn Infra growth

The article discusses how LinkedIn scaled its Salt infrastructure to support its growing needs for remote execution jobs, achieving a tenfold increase in job capacity and improved reliability.

ApacheApache KafkaAzureIrisMySQLPythonZeroMQ

LinkedIn Engineering Team

11 min read

Has Summary

Intermediate

Hosted Search: LinkedIn Search as a managed service

The article discusses LinkedIn's Hosted Search, a fully managed cloud-based search solution designed to simplify the integration of search functionalities for application teams.

ApacheApache Kafka

LinkedIn Engineering Team

12 min read

Has Summary

Beginner

TopicGC: How LinkedIn cleans up unused metadata for its Kafka clusters

The article discusses TopicGC, a service developed by LinkedIn to clean up unused metadata in Kafka clusters.

ApacheApache Kafka

LinkedIn Engineering Team

10 min read

Has Summary

Intermediate

Real-time analytics on network flow data with Apache Pinot

The article discusses how LinkedIn utilizes Apache Pinot for real-time analytics on network flow data, emphasizing the importance of observability in their infrastructure.

ApacheApache KafkaSQL

LinkedIn Engineering Team

10 min read

Has Summary

Intermediate

Load-balanced Brooklin Mirror Maker: Replicating large-scale Kafka clusters at LinkedIn

The article discusses the implementation of a load-balanced Brooklin Mirror Maker at LinkedIn, which efficiently replicates large-scale Kafka clusters.

ApacheApache KafkaAvroSQL

vaibhav maheshwari

14 min read

Has Summary

Beginner

Near real-time features for near real-time personalization

The article discusses the implementation of near real-time personalization features at LinkedIn, focusing on how member actions can be leveraged to enhance recommendation systems without significan...

ApacheApache KafkaApache SparkSQLV

Rupesh Gupta

17 min read

Has Summary

Advanced

Career stories: A cross-country, family move

The article discusses Charles's journey of relocating his family to Silicon Valley for a role at LinkedIn, highlighting the support he received during the transition and his experiences in the tech...

ApacheApache Kafka

LinkedIn Engineering Team

7 min read

Has Summary

Beginner

How LinkedIn turned to real-time feedback for developer tooling

The article discusses how LinkedIn implemented a Real-Time Feedback system to enhance developer tooling and improve productivity.

ApacheApache Kafka

Or Michael Berlowitz

9 min read

Has Summary

Intermediate

Quality matches via personalized AI for hirer and seeker preferences

The article discusses the development of the Qualified Applicant (QA) AI model at LinkedIn, designed to enhance the job matching process by predicting the likelihood of positive recruiter actions b...

ApacheApache Kafka

Konstantin Salomatin

11 min read

Has Summary

Advanced

Bridging batch and stream processing for the Recruiter usage statistics dashboard

This article discusses the integration of batch and stream processing to create a near real-time dashboard for Recruiter usage statistics at LinkedIn.

ApacheApache KafkaJava

Khai Tran

9 min read

Has Summary

Advanced

The Top 2019 LinkedIn Engineering Blogs

The article highlights the top ten engineering blogs from LinkedIn in 2019, focusing on popular topics such as open source, artificial intelligence, and technical challenges at scale.

ApacheApache KafkaAzureMachine LearningXGBoost

Jaren Anderson

8 min read

Has Summary

Intermediate

How LinkedIn customizes Apache Kafka for 7 trillion messages per day

The article discusses how LinkedIn customizes Apache Kafka to handle an impressive scale of 7 trillion messages per day.

ApacheApache KafkaAvroJava

Jon Lee

10 min read

Has Summary

Advanced

Using virtual private clusters for testing Apache Samza

The article discusses the use of virtual private clusters for testing Apache Samza at LinkedIn, emphasizing the importance of stability and rigorous testing in managed stream processing services.

ApacheApache KafkaDockerKubernetes

Rayman Preet Singh

6 min read

Has Summary

Advanced

Bridging Offline and Nearline Computations with Apache Calcite

The article discusses how Apache Calcite can bridge the gap between offline and nearline computations in big data processing.

ApacheApache KafkaApache SparkAvroJavaPythonSQL

Khai Tran

12 min read

Has Summary

Advanced

Samza 1.0: Stream Processing at Massive Scale

The article announces the release of Samza 1. 0, a distributed stream processing framework developed at LinkedIn, highlighting its significant features and improvements.

ApacheApache KafkaAzureCachingJavaKubernetesPythonScalaSQL

Jagadish Venkatraman

10 min read

Has Summary

Advanced

Samza Aeon: Latency Insights for Asynchronous One-Way Flows

The article discusses the development of Samza Aeon, a tool designed to measure latency in asynchronous one-way flows within systems.

ApacheApache KafkaGoogle Cloud

Max Wolffe

8 min read

Has Summary

Intermediate

Getting to Know Todd Palino

The article 'Getting to Know Todd Palino' features insights from Todd Palino, a Senior Staff Site Reliability Engineer at LinkedIn, highlighting his work with Apache Kafka and his contributions to ...

ApacheApache Kafka

Clark Haskins

5 min read

Has Summary

Advanced

Incremental Data Capture for Oracle Databases at LinkedIn: Then and Now

The article discusses the evolution of incremental data capture for Oracle databases at LinkedIn, highlighting the transition from a batch processing model to a near-real-time system.

ApacheApache KafkaIrisJavaMySQLOraclePerlSQL

Saurabh Goyal

9 min read

Has Summary

Intermediate

Site Speed Monitoring in A/B Testing and Feature Ramp-up

The article discusses the importance of site speed monitoring during A/B testing and feature ramp-up at LinkedIn.

ApacheApache KafkaMySQL

Jiahui QI (JOY)

9 min read

Has Summary

Intermediate

Test Strategy for Samza/Kafka Services

The article discusses the evolution of testing strategies for backend services at LinkedIn, specifically focusing on the transition from UI-driven tests to a more efficient testing framework using ...

ApacheApache Kafka

Devi Sridharan

5 min read

Has Summary

Advanced

Stream Processing Hard Problems Part II: Data Access

This article discusses the challenges of data access in high-scale stream processing, particularly focusing on the read/write and read-only data access patterns.

ApacheApache KafkaAWSAzureCassandraOracleSQL

LinkedIn Engineering Team

21 min read

Has Summary

Advanced

Open Sourcing Kafka Monitor

The article discusses the open sourcing of Kafka Monitor, a framework designed to monitor and test Kafka deployments.

ApacheApache KafkaAvroJava

Dong Lin

10 min read

Has Summary

Intermediate

Kafkaesque Days at LinkedIn – Part 1

The article 'Kafkaesque Days at LinkedIn – Part 1' discusses the challenges and incidents faced by LinkedIn while using Apache Kafka for data pipelines and messaging.

ApacheApache Kafka

Joel Koshy

11 min read

Has Summary

Advanced

Kafka Ecosystem at LinkedIn

The article discusses the Kafka ecosystem at LinkedIn, detailing its critical role as a messaging system and the various solutions developed to enhance its functionality.

ApacheApache KafkaAvroJavaMySQL

Joel Koshy

8 min read

Has Summary

Intermediate

Q&A with Jim Brikman: Splitting Up a Codebase into Microservices and Artifacts

This article features a Q&A with Jim Brikman discussing the challenges and strategies for splitting a codebase into microservices and artifacts.

ApacheApache KafkaJavaJavaScriptjQueryMicroservicesPythonRuby

Karan Parikh

11 min read

Has Summary

Intermediate

Bridging Batch and Streaming Data Ingestion with Gobblin

The article discusses Gobblin, a unified data ingestion framework developed by LinkedIn, designed to bridge batch and streaming data ingestion.

ApacheApache KafkaKubernetesLessMySQLOracleSQLSQL Server

Shirshanka Das

7 min read

Has Summary

Advanced

Prototyping Venice: Derived Data Platform

The article discusses the development of Venice, a derived data serving platform designed to improve the handling of derived data at LinkedIn.

ApacheApache Kafka

Félix GV

7 min read

Has Summary

Beginner

Burrow: Kafka Consumer Monitoring Reinvented

The article discusses Burrow, an open-source tool developed by LinkedIn for monitoring Kafka consumers.

ApacheApache Kafka

Todd Palino

7 min read

Includes Code

Has Summary

Intermediate

Technical Talks - Kafka at Scale: Multi-Tier Architectures

The article discusses the implementation of multi-tier architectures using Apache Kafka at scale, highlighting key concepts such as Tiered Cluster Architecture, Kafka Mirror Maker, and performance ...

ApacheApache Kafka

Todd Palino

2 min read

Has Summary

Intermediate

Running Kafka At Scale

The article 'Running Kafka At Scale' discusses how LinkedIn utilizes Apache Kafka as a crucial messaging system for handling vast amounts of data.

ApacheApache KafkaAvro

Todd Palino

10 min read

Includes Code

Has Summary

Beginner

Technical Talks - Perspectives on Data Science

The article discusses a series of technical talks focused on data science, featuring insights from various experts in the field.

ApacheApache Kafka

Dr June Andrews

3 min read

Has Summary

Intermediate

Apache Samza Graduates from Apache Incubator

The article discusses the graduation of Apache Samza from the Apache Incubator to a top-level Apache project, highlighting its significance in stream processing and the community growth during its ...

ApacheApache KafkaOracle

Chris Riccomini

3 min read

Has Summary

Advanced

Apache Samza: LinkedIn’s Stream Processing engine

Apache Samza is LinkedIn's stream processing engine designed to handle real-time data processing needs.

ApacheApache KafkaAssemblySQLZeroMQ

Navina Ramesh

11 min read

Has Summary

Beginner

Tech Talk - A Pragmatic Guide to Web Components

The article provides a comprehensive guide to Web Components, detailing their significance in encapsulating code behind semantic tags.

ApacheApache KafkaJavaScript

LinkedIn Engineering Team

2 min read

Has Summary

Intermediate

Operating Apache Samza at Scale

The article discusses how LinkedIn operates Apache Samza at scale, focusing on its integration with Apache Kafka for processing high volumes of data.

ApacheApache KafkaInfluxDBWhisperYAML

Jon Bringhurst

11 min read

Has Summary

Beginner

Technical Talk @ LinkedIn SF - Content Relevance

The article discusses LinkedIn's Technical Talk focused on content relevance, highlighting the engineering challenges and solutions in recommending content to users.

ApacheApache KafkaMachine Learning

LinkedIn Engineering Team

2 min read

Has Summary

Intermediate

Announcing the Voldemort 1.9.0 Open Source Release

The article announces the release of Voldemort 1. 9. 0, highlighting new features and enhancements aimed at improving the operability of Voldemort clusters.

ApacheApache Kafka

LinkedIn Engineering Team

10 min read

Has Summary

Intermediate

Real time insights into LinkedIn's performance using Apache Samza

The article discusses how LinkedIn utilizes Apache Samza to gain real-time insights into its performance by processing data from numerous services and machines.

ApacheApache KafkaAssemblyAvro

LinkedIn Engineering Team

11 min read

Includes Code

Has Summary

Advanced

Apache Kafka meetup during Hadoop Summit

The article discusses the annual Apache Kafka meetup hosted by LinkedIn during the Hadoop Summit, highlighting the significance of Kafka as a high throughput messaging system.

ApacheApache Kafka

Neha Narkhede

3 min read

Has Summary

Advanced

Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)

The article discusses benchmarking Apache Kafka's performance, achieving 2 million writes per second on a modest hardware setup.

ApacheApache KafkaCassandraGoogle Compute EngineJavaRabbitMQ

Jay Kreps

18 min read

Has Summary

Intermediate

Apache Samza: LinkedIn's Real-time Stream Processing Framework

Apache Samza is LinkedIn's open-source stream processing framework, designed to handle real-time data processing with fault tolerance and scalability.

ApacheApache Kafka

Chris Riccomini

4 min read

Has Summary

Intermediate

Intra-cluster Replication in Apache Kafka

The article discusses intra-cluster replication in Apache Kafka, highlighting its importance for increasing availability and durability within Kafka's messaging system.

ApacheApache Kafka

Jun Rao

7 min read

Has Summary

Advanced

Announcing release of Apache Kafka 0.7.1

The article announces the release of Apache Kafka 0. 7. 1, highlighting its new features and improvements over previous versions.

ApacheApache Kafka

Joel Koshy

3 min read

Has Summary

Intermediate

First Apache release for Kafka is out!

The article announces the first Apache release of Kafka, highlighting its features and significance in data handling.

ApacheApache Kafka

Neha Narkhede

3 min read

Has Summary

Advanced

Project Kafka, a distributed publish-subscribe messaging system, reaches v0.6

Project Kafka, a distributed publish-subscribe messaging system, has reached version 0. 6, enhancing its capabilities for handling activity stream data at LinkedIn.

ApacheApache Kafka

Neha Narkhede

5 min read

Has Summary

You've reached the end! All 48 articles loaded.

Other LinkedIn Technologies

Other Companies Using Apache Kafka

Articles