TopicGC: How LinkedIn cleans up unused metadata for its Kafka clusters

LinkedIn Engineering Team
10 min readbeginner
--
View Original

Overview

The article discusses TopicGC, a service developed by LinkedIn to clean up unused metadata in Kafka clusters. It highlights the challenges posed by unused topics and the significant performance improvements achieved through the implementation of TopicGC, including a 20% reduction in topic count and a 30% improvement in CPU usage.

What You'll Learn

1

How to implement a garbage collection process for unused Kafka topics

2

Why managing metadata pressure is crucial for Kafka cluster performance

3

When to trigger notifications for topic deletion in Kafka

Prerequisites & Requirements

  • Understanding of Apache Kafka and its architecture
  • Familiarity with ZooKeeper and Kafka admin client(optional)

Key Questions Answered

How does TopicGC improve Kafka cluster performance?
TopicGC reduces the number of unused topics by about 20%, which alleviates metadata pressure. This leads to a 30% reduction in CPU usage and a 40% decrease in request latencies, significantly enhancing the overall performance of Kafka clusters.
What criteria does TopicGC use to identify unused topics?
TopicGC identifies unused topics based on criteria such as being empty, having no BytesIn/BytesOut, lacking READ/WRITE access events in the past 60 days, and not being newly created in the past 60 days.
What steps does TopicGC take before deleting a topic?
Before deleting a topic, TopicGC blocks write access, sends notifications to the topic owner, disables mirroring, and performs a last-minute usage check to prevent data loss.

Key Statistics & Figures

Reduction in topic count
20%
Achieved through the implementation of TopicGC.
Improvement in CPU usage
30%
Resulting from the deletion of unused topics.
Decrease in request latencies
40%
Observed after the implementation of TopicGC.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Apache Kafka
Used as the event streaming platform for managing topics.
Backend
Zookeeper
Utilized for storing metadata related to Kafka topics.

Key Actionable Insights

1
Implement a systematic approach to identify and delete unused Kafka topics to maintain cluster performance.
Regularly cleaning up unused topics can prevent metadata pressure, which can lead to performance bottlenecks in Kafka clusters.
2
Utilize notifications effectively to inform topic owners about impending deletions.
This ensures that users have the opportunity to retain important topics that may be temporarily inactive, thereby preventing accidental data loss.
3
Monitor CPU usage and request latencies post-implementation of TopicGC.
Tracking these metrics can help validate the effectiveness of the TopicGC service and guide further optimizations.

Common Pitfalls

1
Failing to notify topic owners before deletion can lead to accidental data loss.
Without proper notifications, users may not have the chance to intervene if a topic they need is marked for deletion.
2
Not blocking write access before deletion can result in lost data.
If a user attempts to write to a topic just before it is deleted, the data will be lost unless write access is blocked beforehand.

Related Concepts

Kafka Topic Management
Metadata Pressure In Distributed Systems
Data Retention Policies In Kafka