Overview
The article discusses benchmarking Apache Kafka's performance, achieving 2 million writes per second on a modest hardware setup. It highlights Kafka's architecture, producer and consumer throughput, and the impact of message size on performance.
What You'll Learn
1
How to benchmark Apache Kafka's write performance on inexpensive hardware
2
Why Kafka's architecture allows for high throughput and low latency
3
How to configure Kafka for optimal producer and consumer throughput
Prerequisites & Requirements
- Understanding of distributed systems and messaging architectures
- Familiarity with Apache Kafka and its configuration(optional)
Key Questions Answered
What is the maximum write throughput of Apache Kafka on low-cost hardware?
The article demonstrates that Apache Kafka can achieve up to 2,024,032 records per second with three producer threads using asynchronous replication on a cluster of three machines. This showcases Kafka's efficiency and scalability even on inexpensive hardware.
How does message size affect Kafka's throughput?
Throughput decreases as message size increases, but the total byte throughput increases with larger messages. The article illustrates that while smaller messages are harder to process due to overhead, larger messages lead to better overall performance in terms of MB/second.
What are the latency metrics for message delivery in Kafka?
The median end-to-end latency for message delivery in Kafka is 2 ms, with 3 ms at the 99th percentile and 14 ms at the 99.9th percentile. This indicates Kafka's capability for low-latency message processing.
Key Statistics & Figures
Single producer thread, no replication throughput
821,557 records/sec
This was achieved while producing 50 million small (100 byte
Single producer thread, 3x asynchronous replication throughput
786,980 records/sec
This shows the impact of adding replication on throughput.
Three producers, 3x async replication throughput
2,024,032 records/sec
This demonstrates the aggregate capacity of the Kafka cluster with multiple producers.
Single Consumer throughput
940,521 records/sec
This was measured while consuming from a 6 partition 3x replicated topic.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1To achieve high throughput in Kafka, utilize multiple producer threads to fully leverage the cluster's capacity.By running three producer processes, the article demonstrates a significant increase in throughput, reaching over 2 million records per second, which is crucial for applications requiring high data ingestion rates.
2Consider the impact of message size on performance when designing your Kafka architecture.The article highlights that smaller messages can lead to higher overhead, suggesting that optimizing message size can improve overall throughput and efficiency in data processing.
3Implement asynchronous replication to enhance write performance while maintaining data durability.As demonstrated in the benchmarks, asynchronous replication allows for faster acknowledgments from producers, which can significantly improve write throughput without sacrificing too much reliability.
Common Pitfalls
1
Over-optimizing Kafka configurations for specific benchmarks can lead to misleading results.
The article emphasizes the importance of 'lazy benchmarking' to ensure that performance metrics reflect real-world usage rather than idealized scenarios that may not be applicable in multi-tenant environments.
Related Concepts
Distributed Systems
Messaging Architectures
Performance Benchmarking
Data Streaming