How Uber Uses Apache Kafka
68 engineering articles about Apache Kafka from Uber's engineering team
Other Uber Technologies
Other Companies Using Apache Kafka
Articles
Filter:
This article introduces uForwarder, Uber's open-source push-based consumer proxy for Apache Kafka's async queuing system.
Zhifeng Chen, Yang Yang, Haifeng Chen
12 min read
Has Summary
--
This article discusses how Uber utilizes a pull-based ingestion model in OpenSearch™ to effectively index streaming data.
Yupeng Fu, Varun Bharadwaj, Shuyi Zhang, Xu Xiong, Michael Froh
14 min read
Has Summary
--
This article discusses Uber's transition from batch to streaming data ingestion using Apache Flink, which significantly enhances data freshness and operational efficiency.
Xinli Shang, Peter Huang, Jing Li, Jing Zhao, Jack Song
6 min read
Has Summary
--
This article discusses Uber's implementation of Apache Pinot to manage and analyze its extensive inventory and catalog data efficiently.
Suraj Modi, Ankit Sultana, Tarun Mavani
11 min read
Has Summary
--
This article discusses the implementation of zone failure resilience in Apache Pinot at Uber, detailing strategies to ensure uninterrupted service during zone failures.
Si Lao, Christina Li, Xuanyi Li, Yang Yang, Ujwala Tulshigiri
10 min read
Has Summary
--
The article discusses the implementation of a Policy Simulator at Uber to enhance the safety and determinism of Identity and Access Management (IAM) policy changes.
Avinash Srivenkatesh, Zi Wen, Zakir Akram
15 min read
Has Summary
--
uReview is an AI code review platform developed by Uber to enhance the code review process by providing timely, high-quality feedback.
The article discusses how Uber processes early chargeback signals to mitigate payment fraud and enhance customer trust.
Avadhut Thakar
7 min read
Has Summary
--
The article discusses the evolution of Uber's Search Platform, highlighting its transition from Elasticsearch to an in-house solution called Sia, and ultimately to the adoption of OpenSearch.
Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen
15 min read
Has Summary
--
The article discusses the MySQL fleet at Uber, which consists of over 2,300 independent clusters that support critical operations for the platform.
Banty Kumar, Debadarsini Nayak, Raja Sriram Ganesan, Amit Jain
15 min read
Has Summary
--
The article discusses Uber's implementation of Presto Express, an enhancement to the Presto SQL query engine aimed at reducing the end-to-end Service Level Agreement (SLA) for short-running queries.
The article discusses Uber's advanced settlement accounting system, which is crucial for managing financial transactions involving payment service providers (PSPs).
Onkar Singh, Sai Sameera Grandhi, Nagesh Kumar Mankala, Abhinav Agarwal
12 min read
Has Summary
--
The article discusses how Uber optimizes the training of Large Language Models (LLMs) using both open-source and in-house models.
ApacheApache KafkaApache SparkCometDockerGoogle CloudGPTGPT-4Hugging FaceKubernetesMistralPyTorchSQLTransformers
Bo Ling, Jiapei Huang, Baojun Liu, Chongxiao Cao, Anant Vyas, Peng Zhang
11 min read
Has Summary
--
This article discusses the significant improvements made to Uber's Experiment Evaluation Engine, achieving a 100x reduction in latency by transitioning from a remote evaluation architecture to a lo...
Akshay Jetli, Deepak Bobbarjung, Sergey Gitlin, Andy Maule
15 min read
Has Summary
--
The article discusses how Uber utilizes Apache Pinot for low-latency offline table analytics, highlighting its capabilities in handling various use cases, including real-time and offline data inges...
Ankit Sultana, Caner Balci
15 min read
Has Summary
--
The article discusses Uber's strategy for shifting end-to-end (E2E) testing left in their development process to improve efficiency and reduce operational costs.
Quess Liu, Daniel Tsui
11 min read
Has Summary
--
The article discusses the Sparkle framework developed by Uber to standardize modular ETL processes, enhancing developer productivity and data quality.
Dinesh Jagannathan, Sharath Bhat, Suman Voleti, Praveen Raj
8 min read
Has Summary
--
Odin is Uber's stateful platform designed to manage various technologies for data storage efficiently.
Jesper Borlum, Gianluca Mezzetti
14 min read
Has Summary
--
This article introduces Kafka Tiered Storage at Uber, detailing its architecture and the motivation behind its implementation.
Satish Duggana, Kamal Chandraprakash, Abhijeet Kumar
9 min read
Has Summary
--
This article discusses how Uber counts job participation at scale, detailing the integration of Apache Pinot™ to address challenges in data processing and analysis.
Ryan Woo, Sameer Kapoor
11 min read
Has Summary
--
The article delves into Uber's comprehensive accounting data testing strategies, emphasizing the importance of precision and integrity in financial processes.
Onkar Singh, Harsha Aditya Ravuri, Viswanath Ramakkagari, Aditya Gopisetti, Hari Srinivasan
16 min read
Has Summary
--
The article discusses Uber's journey in scaling its AI/ML infrastructure, highlighting the transition from on-premise to cloud solutions, the implementation of new technologies, and the optimizatio...
Nav Kankani, Rush Tehrani, Anant Vyas
10 min read
Has Summary
--
The article discusses Uber's efforts to build a scalable, real-time chat system to enhance customer experience.
Avijit Singh, Vivek Shah, Ankit Tyagi
14 min read
Has Summary
--
DataCentral is Uber's proprietary platform designed for Big Data observability, chargeback, and governance.
Arnav Balyan, Atul Mantri, Krishna Karri, Amruth Sampath
10 min read
Has Summary
--
This article discusses Uber's experience with garbage collection (GC) tuning to enhance the reliability of Presto, an open-source distributed SQL query engine.
Cristian Velazquez, Vineeth Karayil Sekharan
11 min read
Has Summary
--
uVitals is an anomaly detection and alerting system developed by Uber to enhance the reliability of its services by quickly identifying and addressing issues in multi-dimensional time series data.
Venki Appiah, Komal Raulkar
14 min read
Has Summary
--
The article discusses how Uber utilizes Apache Pinot for real-time analytics of mobile app crashes, enhancing their ability to detect and resolve issues quickly.
Kriti Dangi, Anil Purohit, Parijat Bansal, Rohit Yadav
17 min read
Has Summary
--
The article discusses the implementation of a Unified Session for analytical events at Uber, aimed at enhancing data consistency and analytics across various applications.
Harsh Desai, Gaurav Yadav, Sahil Jindal, Satyam Shubham, Mahip Jain, Anshal Shukla, Ashok Varma
13 min read
Has Summary
--
The article discusses Uber's implementation of Attribute-Based Access Control (ABAC) to manage access across its microservices architecture.
Alan Cao
10 min read
Has Summary
--
The article discusses Uber's Automated Audit Framework designed to manage and audit financial transactions at internet scale.
Hasit Bhatt, Saurabh Kathpalia, Shashank Agarwal, Jayram Kumar, Hari Srinivasan
15 min read
Has Summary
--
The article discusses Uber's journey in scaling the adoption of Kerberos authentication across its extensive data analytics platform.
Alexander Gulko, Matt Mathew
13 min read
Includes Code
Has Summary
--
The article discusses Uber's transition from a Server-Sent Events (SSE) architecture to a gRPC-based push platform, detailing the motivations, implementation challenges, and outcomes of this migrat...
The article discusses Uber's Machine Learning Education Program, which leverages engineering principles to scale ML education for its employees.
Brooke Carter, Melissa Barr, Michael Mui
12 min read
Has Summary
--
The article discusses how Uber integrates Presto® and Apache Kafka® to enhance its big data analytics capabilities.
Yang Yang, Yupeng Fu, Hitarth Trivedi
10 min read
Has Summary
--
The article discusses Uber's implementation of security features for its Kafka infrastructure, detailing the importance of securing data integrity and access control.
Prateek Agarwal, Ryan Turner, KK Sriramadhesikan
20 min read
Includes Code
Has Summary
--
The article discusses Uber's Emergency Button feature, detailing its evolution, functionality, and the technologies that support it.
Harish Shanker, Calvin Yoon, Dhaval Shah, Mike Yang
10 min read
Has Summary
--
The article discusses strategies to avoid CPU throttling in a containerized environment, particularly at Uber, where stateful workloads run on a large fleet of hosts.
Joakim Recht, Yury Vostrikov
7 min read
Has Summary
--
This article details Uber's migration of financial data from DynamoDB to Docstore, highlighting the challenges faced and the architectural decisions made to ensure data integrity and operational ef...
Piyush Patel, Jaydeepkumar Chovatia, Kaushik Devarajaiah
15 min read
Has Summary
--
The article introduces uGroup, Uber's internal Kafka consumer management framework designed to enhance observability and monitoring of Kafka consumers.
Qichao Chu, Yupeng Fu, Mingmin Chen, Haitao Zhang, Xiaoman Dong
12 min read
Has Summary
--
This article discusses Uber's implementation of a real-time exactly-once ad event processing system using open-source technologies such as Apache Flink, Kafka, and Pinot.
Jacob Tsafatinos, Yuriy Bondaruk, Yupeng Fu, James Kwon
12 min read
Has Summary
--
The article discusses how Uber's Global Scaled Solutions team transitioned from a traditional analytics architecture to a real-time analytics system using Redis, AWS Fargate, and the Dash framework.
Piyush Choudhary, Sujeet Srivastava
12 min read
Has Summary
--
The article discusses Uber's implementation of a Consumer Proxy to enhance Apache Kafka's asynchronous queuing capabilities.
Yang Yang, Zhifeng Chen, Qichao Chu, Haitao Zhang, George Teo
14 min read
Has Summary
--
The article discusses the challenges and solutions in building scalable streaming pipelines for generating near real-time features at Uber.
Feng Xu, Gang Zhao
19 min read
Has Summary
--
The article discusses Uber's initiatives to enhance the efficiency of its Big Data platform, focusing on cost reduction through optimizations in file formats, HDFS erasure coding, YARN scheduling i...
Zheng Shao, Mohammad Islam
18 min read
Has Summary
--
The article discusses the challenges and opportunities Uber faces in reducing the costs associated with its big data platform, which has grown significantly in scale and expense.
Zheng Shao, Mohammad Islam
12 min read
Has Summary
--
The article discusses Uber's comprehensive re-architecture of its Fulfillment Platform, aimed at enhancing its Go/Get strategy.
Ashwin Neerabail, Ankit Srivastava, Kamran Massoudi, Madan Thangavelu, Uday Kiran Medisetty
19 min read
Has Summary
--
This article discusses Uber's journey in containerizing their Apache Hadoop infrastructure, detailing the challenges faced and the solutions implemented over two years.
The article discusses Uber's 'Orders Near You' feature, which utilizes real-time geospatial data analytics to enhance user experience in the Uber Eats app.
Yupeng Fu, Cassandra Tomazic, Dharak Kharod
10 min read
Has Summary
--
The article discusses how Uber analyzes customer issues to enhance user experience by leveraging support data to improve support processes, optimize product experience, and address operational chal...
Nimesh Agarwal, Aravind Ranganathan, Pallavi Nagesharao
20 min read
Has Summary
--
The article discusses the development and implementation of Charon, a real-time analytics framework at Uber designed for automating merchant live monitoring.
Marco Vita, Ujwala Tulshigiri, Dharak Kharod
12 min read
Has Summary
--