How LinkedIn Uses Apache Kafka
48 engineering articles about Apache Kafka from LinkedIn's engineering team
Other LinkedIn Technologies
Other Companies Using Apache Kafka
Articles
Filter:
The article discusses how LinkedIn utilizes Hoptimator to enhance the ingestion process for Apache Pinot, a real-time distributed OLAP datastore.
Ryanne Dolan
9 min read
Has Summary
--
The article discusses a hybrid bulk data processing framework developed to improve recruiting efficiency during data ownership transfers, particularly in the context of company mergers and recruite...
Aditya Hegde
12 min read
Includes Code
Has Summary
--
The article discusses LinkedIn's innovative use of Apache Beam for real-time streaming processing, handling over 4 trillion events daily across more than 3,000 pipelines.
Bingfeng Xia
16 min read
Has Summary
--
The article discusses how LinkedIn scaled its Salt infrastructure to support its growing needs for remote execution jobs, achieving a tenfold increase in job capacity and improved reliability.
The article discusses LinkedIn's Hosted Search, a fully managed cloud-based search solution designed to simplify the integration of search functionalities for application teams.
LinkedIn Engineering Team
12 min read
Has Summary
--
The article discusses TopicGC, a service developed by LinkedIn to clean up unused metadata in Kafka clusters.
LinkedIn Engineering Team
10 min read
Has Summary
--
The article discusses how LinkedIn utilizes Apache Pinot for real-time analytics on network flow data, emphasizing the importance of observability in their infrastructure.
LinkedIn Engineering Team
10 min read
Has Summary
--
The article discusses the implementation of a load-balanced Brooklin Mirror Maker at LinkedIn, which efficiently replicates large-scale Kafka clusters.
vaibhav maheshwari
14 min read
Has Summary
--
The article discusses the implementation of near real-time personalization features at LinkedIn, focusing on how member actions can be leveraged to enhance recommendation systems without significan...
Rupesh Gupta
17 min read
Has Summary
--
The article discusses Charles's journey of relocating his family to Silicon Valley for a role at LinkedIn, highlighting the support he received during the transition and his experiences in the tech...
LinkedIn Engineering Team
7 min read
Has Summary
--
The article discusses how LinkedIn implemented a Real-Time Feedback system to enhance developer tooling and improve productivity.
Or Michael Berlowitz
9 min read
Has Summary
--
The article discusses the development of the Qualified Applicant (QA) AI model at LinkedIn, designed to enhance the job matching process by predicting the likelihood of positive recruiter actions b...
Konstantin Salomatin
11 min read
Has Summary
--
This article discusses the integration of batch and stream processing to create a near real-time dashboard for Recruiter usage statistics at LinkedIn.
Khai Tran
9 min read
Has Summary
--
The article highlights the top ten engineering blogs from LinkedIn in 2019, focusing on popular topics such as open source, artificial intelligence, and technical challenges at scale.
Jaren Anderson
8 min read
Has Summary
--
The article discusses how LinkedIn customizes Apache Kafka to handle an impressive scale of 7 trillion messages per day.
Jon Lee
10 min read
Has Summary
--
The article discusses the use of virtual private clusters for testing Apache Samza at LinkedIn, emphasizing the importance of stability and rigorous testing in managed stream processing services.
Rayman Preet Singh
6 min read
Has Summary
--
The article discusses how Apache Calcite can bridge the gap between offline and nearline computations in big data processing.
Khai Tran
12 min read
Has Summary
--
The article announces the release of Samza 1. 0, a distributed stream processing framework developed at LinkedIn, highlighting its significant features and improvements.
Jagadish Venkatraman
10 min read
Has Summary
--
The article discusses the development of Samza Aeon, a tool designed to measure latency in asynchronous one-way flows within systems.
Max Wolffe
8 min read
Has Summary
--
The article 'Getting to Know Todd Palino' features insights from Todd Palino, a Senior Staff Site Reliability Engineer at LinkedIn, highlighting his work with Apache Kafka and his contributions to ...
Clark Haskins
5 min read
Has Summary
--
The article discusses the evolution of incremental data capture for Oracle databases at LinkedIn, highlighting the transition from a batch processing model to a near-real-time system.
The article discusses the importance of site speed monitoring during A/B testing and feature ramp-up at LinkedIn.
Jiahui QI (JOY)
9 min read
Has Summary
--
The article discusses the evolution of testing strategies for backend services at LinkedIn, specifically focusing on the transition from UI-driven tests to a more efficient testing framework using ...
Devi Sridharan
5 min read
Has Summary
--
This article discusses the challenges of data access in high-scale stream processing, particularly focusing on the read/write and read-only data access patterns.
The article discusses the open sourcing of Kafka Monitor, a framework designed to monitor and test Kafka deployments.
Dong Lin
10 min read
Has Summary
--
The article 'Kafkaesque Days at LinkedIn – Part 1' discusses the challenges and incidents faced by LinkedIn while using Apache Kafka for data pipelines and messaging.
Joel Koshy
11 min read
Has Summary
--
The article discusses the Kafka ecosystem at LinkedIn, detailing its critical role as a messaging system and the various solutions developed to enhance its functionality.
Joel Koshy
8 min read
Has Summary
--
This article features a Q&A with Jim Brikman discussing the challenges and strategies for splitting a codebase into microservices and artifacts.
Karan Parikh
11 min read
Has Summary
--
The article discusses Gobblin, a unified data ingestion framework developed by LinkedIn, designed to bridge batch and streaming data ingestion.
Shirshanka Das
7 min read
Has Summary
--
The article discusses the development of Venice, a derived data serving platform designed to improve the handling of derived data at LinkedIn.
Félix GV
7 min read
Has Summary
--
The article discusses Burrow, an open-source tool developed by LinkedIn for monitoring Kafka consumers.
Todd Palino
7 min read
Includes Code
Has Summary
--
The article discusses the implementation of multi-tier architectures using Apache Kafka at scale, highlighting key concepts such as Tiered Cluster Architecture, Kafka Mirror Maker, and performance ...
Todd Palino
2 min read
Has Summary
--
The article 'Running Kafka At Scale' discusses how LinkedIn utilizes Apache Kafka as a crucial messaging system for handling vast amounts of data.
Todd Palino
10 min read
Includes Code
Has Summary
--
The article discusses a series of technical talks focused on data science, featuring insights from various experts in the field.
Dr June Andrews
3 min read
Has Summary
--
The article discusses the graduation of Apache Samza from the Apache Incubator to a top-level Apache project, highlighting its significance in stream processing and the community growth during its ...
Chris Riccomini
3 min read
Has Summary
--
Apache Samza is LinkedIn's stream processing engine designed to handle real-time data processing needs.
Navina Ramesh
11 min read
Has Summary
--
The article provides a comprehensive guide to Web Components, detailing their significance in encapsulating code behind semantic tags.
LinkedIn Engineering Team
2 min read
Has Summary
--
The article discusses how LinkedIn operates Apache Samza at scale, focusing on its integration with Apache Kafka for processing high volumes of data.
Jon Bringhurst
11 min read
Has Summary
--
The article discusses LinkedIn's Technical Talk focused on content relevance, highlighting the engineering challenges and solutions in recommending content to users.
LinkedIn Engineering Team
2 min read
Has Summary
--
The article announces the release of Voldemort 1. 9. 0, highlighting new features and enhancements aimed at improving the operability of Voldemort clusters.
LinkedIn Engineering Team
10 min read
Has Summary
--
The article discusses how LinkedIn utilizes Apache Samza to gain real-time insights into its performance by processing data from numerous services and machines.
LinkedIn Engineering Team
11 min read
Includes Code
Has Summary
--
The article discusses the annual Apache Kafka meetup hosted by LinkedIn during the Hadoop Summit, highlighting the significance of Kafka as a high throughput messaging system.
Neha Narkhede
3 min read
Has Summary
--
The article discusses benchmarking Apache Kafka's performance, achieving 2 million writes per second on a modest hardware setup.
Jay Kreps
18 min read
Has Summary
--
Apache Samza is LinkedIn's open-source stream processing framework, designed to handle real-time data processing with fault tolerance and scalability.
Chris Riccomini
4 min read
Has Summary
--
The article discusses intra-cluster replication in Apache Kafka, highlighting its importance for increasing availability and durability within Kafka's messaging system.
Jun Rao
7 min read
Has Summary
--
The article announces the release of Apache Kafka 0. 7. 1, highlighting its new features and improvements over previous versions.
Joel Koshy
3 min read
Has Summary
--
The article announces the first Apache release of Kafka, highlighting its features and significance in data handling.
Neha Narkhede
3 min read
Has Summary
--
Project Kafka, a distributed publish-subscribe messaging system, has reached version 0. 6, enhancing its capabilities for handling activity stream data at LinkedIn.
Neha Narkhede
5 min read
Has Summary
--
You've reached the end! All 48 articles loaded.