Kafka Ecosystem at LinkedIn

Joel Koshy
8 min readadvanced
--
View Original

Overview

The article discusses the Kafka ecosystem at LinkedIn, detailing its critical role as a messaging system and the various solutions developed to enhance its functionality. It highlights Kafka's scalability, durability, and low latency, as well as the challenges faced and the strategies implemented to address them.

What You'll Learn

1

How to utilize Apache Kafka for high-throughput messaging systems

2

Why Avro is used for schema management in Kafka data pipelines

3

How to implement self-service Kafka systems using REST APIs

Prerequisites & Requirements

  • Understanding of messaging systems and data pipelines
  • Familiarity with Apache Kafka and Avro(optional)

Key Questions Answered

What is the role of Kafka in LinkedIn's data architecture?
Kafka serves as LinkedIn's central data pipeline, handling over 1.4 trillion messages per day across more than 1400 brokers. It is utilized for various mission-critical use cases, including database replication and supporting data platforms.
How does LinkedIn manage schema for Kafka messages?
LinkedIn standardizes on Avro for schema management, where each producer encodes Avro data, registers schemas in a schema registry, and includes a schema-ID in messages. Consumers then fetch the corresponding schema for deserialization.
What challenges does LinkedIn face with its Kafka implementation?
As Kafka usage grows, challenges include the need for custom topic configurations, difficulties in metadata discovery, and managing access control lists (ACLs) for topic security. These issues necessitate a self-service portal for users.
What tools does LinkedIn use for monitoring Kafka consumer health?
LinkedIn employs Burrow, a monitoring tool that checks Kafka consumer health by providing insights into consumer lag and status without needing to set thresholds. This allows for effective monitoring of consumer performance.

Key Statistics & Figures

Messages handled per day
1.4 trillion
This statistic illustrates the scale at which Kafka operates within LinkedIn's infrastructure.
Number of Kafka brokers
1400
LinkedIn's Kafka deployment consists of over 1400 brokers, facilitating its extensive messaging needs.
Data received weekly
over 2 petabytes
This highlights the volume of data processed by Kafka in a typical week at LinkedIn.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Messaging System
Apache Kafka
Serves as LinkedIn's central data pipeline for handling large volumes of messages.
Data Serialization
Avro
Used for schema management within Kafka data pipelines.
Monitoring Tool
Burrow
Monitors Kafka consumer health and provides insights into consumer status.
Ingestion Framework
Gobblin
Facilitates the ingestion of data from Kafka into Hadoop for offline processing.
Stream Processing
Samza
LinkedIn’s stream processing platform for running production workloads.

Key Actionable Insights

1
Implementing a self-service Kafka system can enhance user autonomy and efficiency.
By allowing users to manage their topics and metadata through a portal like Nuage, organizations can reduce the administrative overhead and empower teams to work more independently.
2
Utilizing Avro for schema management can streamline data serialization and deserialization processes.
Avro's schema registry ensures that producers and consumers are aligned on data formats, reducing errors and improving data integrity across Kafka pipelines.
3
Regularly monitoring Kafka deployments with tools like Burrow can prevent potential issues before they escalate.
By keeping track of consumer health and lag, teams can proactively address performance bottlenecks and ensure that data flows smoothly through the system.

Common Pitfalls

1
Failing to manage topic configurations can lead to operational challenges.
Some topics may require custom configurations that necessitate special requests to Kafka SREs, complicating user experience and slowing down development.
2
Neglecting consumer health monitoring can result in unnoticed performance issues.
Without tools like Burrow, organizations may miss critical consumer lag and health metrics, leading to data processing delays.

Related Concepts

Data Pipelines
Messaging Systems
Stream Processing Frameworks
Schema Management