Overview
The article discusses how LinkedIn customizes Apache Kafka to handle an impressive scale of 7 trillion messages per day. It highlights the challenges faced at this scale and the tailored solutions implemented to enhance Kafka's operability and scalability.
What You'll Learn
1
How to manage Kafka at scale to handle trillions of messages daily
2
Why maintaining internal release branches for Kafka is crucial for production stability
3
When to apply hotfix patches versus upstream patches in Kafka development
Prerequisites & Requirements
- Understanding of distributed systems and stream processing
- Experience with Apache Kafka or similar messaging systems(optional)
Key Questions Answered
How does LinkedIn customize Apache Kafka for high message throughput?
LinkedIn customizes Apache Kafka by maintaining over 100 clusters with 4,000 brokers, handling more than 100,000 topics and 7 million partitions. This customization includes internal release branches with patches tailored for production requirements, allowing them to manage 7 trillion messages daily.
What challenges does LinkedIn face with Kafka at scale?
At LinkedIn's scale, challenges include scalability and operability issues such as slow controller performance and memory pressure. These issues can lead to cascading failures, necessitating the introduction of hotfix patches to mitigate risks and improve performance.
What is the process for creating a new LinkedIn Kafka release branch?
Creating a new LinkedIn Kafka release branch involves branching off from an Apache Kafka release branch, moving hotfix patches from the previous LinkedIn release, and certifying the new release against real production traffic to ensure stability and performance.
What types of patches are maintained in LinkedIn's Kafka release branches?
LinkedIn's Kafka release branches maintain several types of patches: upstream patches, cherry-picks from upstream, hotfix patches for urgent issues, and LinkedIn-only patches that are not intended for upstream due to their internal nature.
Key Statistics & Figures
Messages handled per day
7 trillion
This figure represents the scale at which LinkedIn operates its Kafka infrastructure.
Number of Kafka clusters
over 100
LinkedIn maintains this many clusters to support its extensive messaging and data processing needs.
Number of brokers
more than 4,000
These brokers are essential for managing the high volume of messages and topics within LinkedIn's Kafka ecosystem.
Number of topics
more than 100,000
This extensive number of topics indicates the diverse use cases powered by Kafka at LinkedIn.
Number of partitions
7 million
Partitions are critical for scaling Kafka's performance and managing message throughput effectively.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1Implement a structured commit message strategy for patch management in Kafka.Using structured commit messages helps in tracking which patches need to be moved to new release branches, ensuring a smoother transition and better management of hotfixes.
2Regularly assess the urgency of patches to determine whether to apply hotfixes or upstream patches.Understanding the urgency allows teams to prioritize critical fixes without compromising on the quality of upstream contributions.
3Leverage the maintenance mode feature for brokers to streamline replica management.This feature allows for easier removal of bad brokers while maintaining data redundancy, reducing the risk of data loss during operational changes.
Common Pitfalls
1
Failing to properly assess the urgency of patches can lead to delays in critical fixes.
This often happens when teams do not have a clear process for prioritizing patches, resulting in potential downtime or performance issues.
2
Neglecting to upstream patches can lead to fragmentation between internal and external versions of Kafka.
Without a consistent upstream strategy, organizations may miss out on community improvements and face challenges in maintaining their custom versions.
Related Concepts
Distributed Systems
Stream Processing
Patch Management In Open Source Software