Overview
The article introduces Northguard and Xinfra, two innovative systems developed by LinkedIn to enhance log storage scalability and operability. It discusses the challenges faced with Kafka and how Northguard addresses these issues while Xinfra provides a virtualized Pub/Sub layer to facilitate seamless migration and integration.
What You'll Learn
1
How to implement a scalable log storage system using Northguard
2
Why virtualization in Pub/Sub systems is crucial for seamless migration
3
How to leverage log striping for balanced data distribution
Prerequisites & Requirements
- Understanding of distributed systems and Pub/Sub patterns
- Familiarity with Kafka and log storage concepts(optional)
Key Questions Answered
What challenges did LinkedIn face with Kafka?
LinkedIn faced several challenges with Kafka, including scalability issues due to increased traffic and metadata, operability problems with load balancing across over 100 clusters, and limitations in availability and consistency. These challenges prompted the need for a new log storage solution.
How does Northguard improve log storage scalability?
Northguard enhances scalability by sharding data and metadata, minimizing global state, and employing a decentralized group membership protocol. This design allows for more efficient load distribution and faster cluster deployments, addressing the limitations of Kafka.
What is Xinfra and how does it support Northguard?
Xinfra is a virtualized Pub/Sub layer that supports both Northguard and Kafka, allowing for a unified experience across different systems. It enables seamless migration of topics between clusters without requiring changes from applications, thus simplifying the transition process.
What is the role of log striping in Northguard?
Log striping in Northguard breaks logs into smaller chunks to balance I/O load across the cluster. This approach prevents resource skew and allows new brokers to organically become segment replicas, enhancing the overall efficiency and reliability of the log storage system.
Key Statistics & Figures
Records processed per day
32T
LinkedIn's Kafka was handling over 32 trillion records per day.
Data volume handled
17 PB/day
Kafka was managing a data volume of 17 petabytes per day across 400K topics.
Cluster count
150
Kafka was distributed across more than 150 clusters.
Member count
1.2 billion
LinkedIn serves over 1.2 billion members, highlighting the scale of their data needs.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Backend
Kafka
Previously used for log storage and messaging before transitioning to Northguard.
Backend
Northguard
New log storage system designed for improved scalability and operability.
Backend
Xinfra
Virtualized Pub/Sub layer that supports both Northguard and Kafka.
Key Actionable Insights
1Implement log striping in your log storage system to enhance load balancing and resource distribution.By breaking logs into smaller chunks, you can avoid resource skew and ensure that new brokers can efficiently take on segments, improving overall system performance.
2Consider virtualization for your Pub/Sub systems to facilitate easier migration and integration.Virtualization allows for seamless transitions between different systems without requiring significant changes to existing applications, thus minimizing downtime and operational complexity.
3Focus on sharding both data and metadata to improve scalability in distributed systems.Sharding helps manage increased traffic and metadata efficiently, allowing your system to scale without becoming a bottleneck.
Common Pitfalls
1
Underestimating the complexity of migrating from Kafka to Northguard can lead to significant operational challenges.
Many applications rely on Kafka's client, which complicates the transition due to its lack of virtualization. Planning for a seamless migration is crucial to avoid downtime and ensure continuity.
Related Concepts
Distributed Systems
Pub/Sub Patterns
Log Storage Solutions