#

Apache Kafka Programming Tutorials & Engineering Articles

180 Apache Kafka tutorials, guides, and engineering insights from Uber, LinkedIn, Pinterest, and more

Apache Kafka Articles & Tutorials

Filter:
Pinterest logo
Pinterest
Intermediate
The article discusses Pinterest's transition to a next-generation database ingestion framework designed to address the limitations of legacy systems.
Pinterest Engineering
10 min read
Includes Code
Has Summary
--
Uber logo
Uber
Intermediate
This article introduces uForwarder, Uber's open-source push-based consumer proxy for Apache Kafka's async queuing system.
Zhifeng Chen, Yang Yang, Haifeng Chen
12 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses how Uber utilizes a pull-based ingestion model in OpenSearch™ to effectively index streaming data.
Yupeng Fu, Varun Bharadwaj, Shuyi Zhang, Xu Xiong, Michael Froh
14 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses Uber's transition from batch to streaming data ingestion using Apache Flink, which significantly enhances data freshness and operational efficiency.
Xinli Shang, Peter Huang, Jing Li, Jing Zhao, Jack Song
6 min read
Has Summary
--
Uber logo
Uber
Intermediate
This article discusses Uber's implementation of Apache Pinot to manage and analyze its extensive inventory and catalog data efficiently.
Suraj Modi, Ankit Sultana, Tarun Mavani
11 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses the implementation of zone failure resilience in Apache Pinot at Uber, detailing strategies to ensure uninterrupted service during zone failures.
Si Lao, Christina Li, Xuanyi Li, Yang Yang, Ujwala Tulshigiri
10 min read
Has Summary
--
Netflix logo
Netflix
Advanced
Netflix engineered a real-time recommendation delivery system for live events that can update over 100 million devices in under a minute.
Netflix Technology Blog
9 min read
Has Summary
--
Netflix logo
Netflix
Advanced
Netflix built a Real-Time Distributed Graph (RDG) to connect member interaction data across their expanding business verticals including streaming, live events, and mobile games.
Netflix Technology Blog
8 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses the implementation of a Policy Simulator at Uber to enhance the safety and determinism of Identity and Access Management (IAM) policy changes.
Avinash Srivenkatesh, Zi Wen, Zakir Akram
15 min read
Has Summary
--
Stripe logo
Stripe
Advanced
The article discusses the critical importance of maintaining consistent data across multiple systems as organizations grow.
James Beswick
11 min read
Has Summary
--
Stripe logo
Stripe
Advanced
The article discusses the challenges of maintaining consistent product data across systems and third-party platforms in digital commerce, focusing on reconciliation patterns that can enhance data i...
James Beswick
9 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
uReview is an AI code review platform developed by Uber to enhance the code review process by providing timely, high-quality feedback.
Sonal Mahajan, Shauvik Roy Choudhary, Akshay Utture, Will Bond, Joseph Wang
14 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses how Uber processes early chargeback signals to mitigate payment fraud and enhance customer trust.
Avadhut Thakar
7 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the evolution of Uber's Search Platform, highlighting its transition from Elasticsearch to an in-house solution called Sia, and ultimately to the adoption of OpenSearch.
Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen
15 min read
Has Summary
--
LinkedIn logo
LinkedIn
Intermediate
The article discusses how LinkedIn utilizes Hoptimator to enhance the ingestion process for Apache Pinot, a real-time distributed OLAP datastore.
Ryanne Dolan
9 min read
Has Summary
--
Netflix logo
Netflix
Intermediate
The article discusses the implementation of a system at Netflix for tracking 'impressions'—the visual elements users interact with while browsing content.
Netflix Technology Blog
7 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the MySQL fleet at Uber, which consists of over 2,300 independent clusters that support critical operations for the platform.
Banty Kumar, Debadarsini Nayak, Raja Sriram Ganesan, Amit Jain
15 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Change Data Capture (CDC) at Pinterest, detailing its importance for real-time data processing and the implementation of a Generic CDC solution using Debezium.
Pinterest Engineering
8 min read
Has Summary
--
Netflix logo
Netflix
Advanced
This article discusses Netflix's Distributed Counter Abstraction, a service designed to enable distributed counting at scale while maintaining low latency performance.
Netflix Technology Blog
22 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's implementation of Presto Express, an enhancement to the Presto SQL query engine aimed at reducing the end-to-end Service Level Agreement (SLA) for short-running queries.
Mingjia Hang, Gurmeet Singh
10 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's advanced settlement accounting system, which is crucial for managing financial transactions involving payment service providers (PSPs).
Onkar Singh, Sai Sameera Grandhi, Nagesh Kumar Mankala, Abhinav Agarwal
12 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses how Uber optimizes the training of Large Language Models (LLMs) using both open-source and in-house models.
Bo Ling, Jiapei Huang, Baojun Liu, Chongxiao Cao, Anant Vyas, Peng Zhang
11 min read
Has Summary
--
Uber logo
Uber
Intermediate
This article discusses the significant improvements made to Uber's Experiment Evaluation Engine, achieving a 100x reduction in latency by transitioning from a remote evaluation architecture to a lo...
Akshay Jetli, Deepak Bobbarjung, Sergey Gitlin, Andy Maule
15 min read
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
This article discusses the integration of ClickHouse, a high-performance columnar database, with Estuary Flow, a data integration platform, to enable real-time analytics on Salesforce data.
Estuary
5 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Pinterest's implementation of Tiered Storage for Apache Kafka®️, highlighting a broker-decoupled approach that offloads data to cheaper remote storage.
Pinterest Engineering
24 min read
Includes Code
Has Summary
--
Airbnb logo
Airbnb
Advanced
The article provides an in-depth exploration of Riverbed, a framework within Airbnb's tech stack that optimizes data consumption from system-of-record data stores and updates secondary read-optimiz...
Xiangmin Liang
9 min read
Includes Code
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses how Uber utilizes Apache Pinot for low-latency offline table analytics, highlighting its capabilities in handling various use cases, including real-time and offline data inges...
Ankit Sultana, Caner Balci
15 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's strategy for shifting end-to-end (E2E) testing left in their development process to improve efficiency and reduce operational costs.
Quess Liu, Daniel Tsui
11 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses the Sparkle framework developed by Uber to standardize modular ETL processes, enhancing developer productivity and data quality.
Dinesh Jagannathan, Sharath Bhat, Suman Voleti, Praveen Raj
8 min read
Has Summary
--
Uber logo
Uber
Advanced
Odin is Uber's stateful platform designed to manage various technologies for data storage efficiently.
Jesper Borlum, Gianluca Mezzetti
14 min read
Has Summary
--
Uber logo
Uber
Intermediate
This article introduces Kafka Tiered Storage at Uber, detailing its architecture and the motivation behind its implementation.
Satish Duggana, Kamal Chandraprakash, Abhijeet Kumar
9 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses how Uber counts job participation at scale, detailing the integration of Apache Pinot™ to address challenges in data processing and analysis.
Ryan Woo, Sameer Kapoor
11 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article delves into Uber's comprehensive accounting data testing strategies, emphasizing the importance of precision and integrity in financial processes.
Onkar Singh, Harsha Aditya Ravuri, Viswanath Ramakkagari, Aditya Gopisetti, Hari Srinivasan
16 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's journey in scaling its AI/ML infrastructure, highlighting the transition from on-premise to cloud solutions, the implementation of new technologies, and the optimizatio...
Nav Kankani, Rush Tehrani, Anant Vyas
10 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Uber's efforts to build a scalable, real-time chat system to enhance customer experience.
Avijit Singh, Vivek Shah, Ankit Tyagi
14 min read
Has Summary
--
Uber logo
Uber
Intermediate
DataCentral is Uber's proprietary platform designed for Big Data observability, chargeback, and governance.
Arnav Balyan, Atul Mantri, Krishna Karri, Amruth Sampath
10 min read
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses a hybrid bulk data processing framework developed to improve recruiting efficiency during data ownership transfers, particularly in the context of company mergers and recruite...
Aditya Hegde
12 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
This article discusses Uber's experience with garbage collection (GC) tuning to enhance the reliability of Presto, an open-source distributed SQL query engine.
Cristian Velazquez, Vineeth Karayil Sekharan
11 min read
Has Summary
--
Uber logo
Uber
Advanced
uVitals is an anomaly detection and alerting system developed by Uber to enhance the reliability of its services by quickly identifying and addressing issues in multi-dimensional time series data.
Venki Appiah, Komal Raulkar
14 min read
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
This article discusses the CGW Stack, which combines ClickHouse, Grafana, and WarpStream to provide a cost-effective and efficient logging solution at scale.
Dale McDiarmid & Ryadh Dahimene
25 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the implementation and operational benefits of the Unified PubSub Client (PSC) at Pinterest, highlighting improvements in developer velocity, stability, and scalability.
Pinterest Engineering
11 min read
Includes Code
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses how Uber utilizes Apache Pinot for real-time analytics of mobile app crashes, enhancing their ability to detect and resolve issues quickly.
Kriti Dangi, Anil Purohit, Parijat Bansal, Rohit Yadav
17 min read
Has Summary
--
ClickHouse logo
ClickHouse
Beginner
This article discusses strategies for making large data loads into ClickHouse resilient and efficient, particularly when migrating from other systems.
Tom Schreiber
15 min read
Includes Code
Has Summary
--
LinkedIn logo
LinkedIn
Advanced
The article discusses LinkedIn's innovative use of Apache Beam for real-time streaming processing, handling over 4 trillion events daily across more than 3,000 pipelines.
Bingfeng Xia
16 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the implementation of a Unified Session for analytical events at Uber, aimed at enhancing data consistency and analytics across various applications.
Harsh Desai, Gaurav Yadav, Sahil Jindal, Satyam Shubham, Mahip Jain, Anshal Shukla, Ashok Varma
13 min read
Has Summary
--
Palantir logo
Palantir
Advanced
The article discusses the transformative impact of real-time data and data streaming technologies on business operations, emphasizing their role in enabling rapid decision-making.
Palantir
15 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Uber's implementation of Attribute-Based Access Control (ABAC) to manage access across its microservices architecture.
ClickHouse logo
ClickHouse
Beginner
This article discusses the integration of ClickHouse with Kafka Connect and Confluent Cloud to facilitate real-time event streaming, particularly for Ethereum Cryptocurrency events.
Dale McDiarmid
20 min read
Includes Code
Has Summary
--
ClickHouse logo
ClickHouse
Beginner
This article serves as an introductory guide to implementing Change Data Capture (CDC) between PostgreSQL and ClickHouse, utilizing native features and tools like Debezium and Kafka.
Dale McDiarmid
26 min read
Includes Code
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's Automated Audit Framework designed to manage and audit financial transactions at internet scale.
Hasit Bhatt, Saurabh Kathpalia, Shashank Agarwal, Jayram Kumar, Hari Srinivasan
15 min read
Has Summary
--