Apache Kafka meetup during Hadoop Summit

Neha Narkhede
3 min readadvanced
--
View Original

Overview

The article discusses the annual Apache Kafka meetup hosted by LinkedIn during the Hadoop Summit, highlighting the significance of Kafka as a high throughput messaging system. It outlines the event details, agenda, and topics to be covered by various speakers, emphasizing Kafka's role in real-time data processing.

What You'll Learn

1

How to operate Kafka at scale in production environments

2

How to secure Kafka communications using mutual SSL authentication

3

Why stream processing is essential for real-time data systems

4

When to use Apache Samza for scalable stream processing

Key Questions Answered

What is the significance of Apache Kafka in data processing?
Apache Kafka is a high throughput messaging system that is horizontally scalable, fault-tolerant, and low-latency. It is used for real-time structured logs and supports numerous companies in managing their data pipelines effectively.
What topics will be covered at the Apache Kafka meetup?
The meetup will cover several topics including operating Kafka at scale, securing Kafka, and using Apache Samza for stream processing. Each session will be led by experts from LinkedIn and Salesforce, providing insights into practical applications and challenges.
How does Salesforce utilize Kafka for monitoring?
Salesforce has built a scalable, near real-time monitoring system using Kafka. They collect metrics data from various applications and secure cross data center traffic through mutual SSL authentication, ensuring reliable data reporting to tools like Graphite.
What is the agenda for the Apache Kafka meetup?
The agenda includes registration and networking, talks on operating Kafka at scale, securing Kafka, and using Apache Samza for stream processing, followed by a Q&A session with Kafka committers. The event is scheduled for June 3 from 6:30 PM to 9 PM.

Key Statistics & Figures

Events captured per day
60 billion
LinkedIn captures over 60 billion events per day using Kafka for their data flows.
Active community members
700
Kafka has a growing community of more than 700 active members contributing to its development.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Messaging System
Apache Kafka
Used for high throughput messaging and real-time data processing.
Stream Processing Framework
Apache Samza
Facilitates writing scalable stream processing jobs.

Key Actionable Insights

1
Attending the Apache Kafka meetup can provide valuable networking opportunities with industry experts and peers.
Networking at such events can lead to collaborations, insights into best practices, and potential job opportunities in the field of data engineering.
2
Implementing mutual SSL authentication in Kafka can enhance security for data in transit.
This is particularly important for organizations handling sensitive data across multiple data centers, ensuring that only authorized clients can communicate with Kafka brokers.
3
Utilizing Apache Samza can simplify the development of scalable stream processing applications.
By abstracting the complexities of distributed processing, Samza allows developers to focus on application logic rather than infrastructure concerns.

Common Pitfalls

1
Overlooking the importance of security in Kafka deployments can lead to vulnerabilities.
Without proper security measures like SSL authentication, sensitive data may be exposed during transmission, leading to potential data breaches.

Related Concepts

Real-time Data Processing
Stream Processing Frameworks
Data Pipeline Management