How Uber Uses Apache
195 engineering articles about Apache from Uber's engineering team
Other Uber Technologies
Other Companies Using Apache
Articles
Filter:
This article introduces uForwarder, Uber's open-source push-based consumer proxy for Apache Kafka's async queuing system.
Zhifeng Chen, Yang Yang, Haifeng Chen
12 min read
Has Summary
--
This article details how Uber optimized Apache Hadoop's Distcp (Distributed Copy) tool to scale their data replication infrastructure from handling 250 TB to petabytes of daily data movement.
Abhay Yadav, Radhika Patwari, Sanjay Sundaresan
15 min read
Has Summary
--
This article details how Uber built and scaled Apache Hudi to power one of the world's largest data lakes, managing 19,500 datasets with trillions of records across a multi-hundred-petabyte reposit...
Prashant Wason, Balajee Nagasubramaniam, Surya Prasanna Kumar Yalla, Meenal Binwade, Xinli Shang, Jack Song
19 min read
Has Summary
--
The article discusses Uber's transition from traditional keyword-based search using Apache Lucene to implementing semantic vector search with Amazon OpenSearch.
Hao Sun, Jiasen Xu, Smit Patel, Anand Kotriwal, Xu Zhang
11 min read
Has Summary
--
This article discusses how Uber utilizes a pull-based ingestion model in OpenSearch™ to effectively index streaming data.
Yupeng Fu, Varun Bharadwaj, Shuyi Zhang, Xu Xiong, Michael Froh
14 min read
Has Summary
--
This article discusses Uber's transition from batch to streaming data ingestion using Apache Flink, which significantly enhances data freshness and operational efficiency.
Xinli Shang, Peter Huang, Jing Li, Jing Zhao, Jack Song
6 min read
Has Summary
--
This article discusses Uber's implementation of Apache Pinot to manage and analyze its extensive inventory and catalog data efficiently.
Suraj Modi, Ankit Sultana, Tarun Mavani
11 min read
Has Summary
--
The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experien...
Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen
11 min read
Has Summary
--
The article discusses Uber's implementation of I/O observability for its massive petabyte-scale data lake, focusing on the challenges and solutions in monitoring data access patterns across its hyb...
Arnav Balyan, Kartik Bommepally, Amruth Sampath, Jing Zhao, Akshayaprakash Sharma
10 min read
Has Summary
--
This article discusses the implementation of zone failure resilience in Apache Pinot at Uber, detailing strategies to ensure uninterrupted service during zone failures.
Si Lao, Christina Li, Xuanyi Li, Yang Yang, Ujwala Tulshigiri
10 min read
Has Summary
--
The article discusses Uber's approach to enhancing the safety of machine learning (ML) model deployments through a series of mechanisms integrated into their ML life cycle.
Sophie Wang, Jia Li, Joseph Wang
10 min read
Has Summary
--
This article discusses the rebuilding of Uber's Apache Pinot™ query architecture, focusing on the transition from Neutrino to a new query system that utilizes Pinot's Multi-Stage Engine Lite Mode.
This article details how Uber standardized its mobile analytics system to improve data consistency and quality across its applications.
Uber's migration from Spark 2. 4 to Spark 3. 3 involved upgrading over 2 million Spark applications, utilizing innovative automation tools like Iron Dome.
Amruth Sampath, Arnav Balyan, Nimesh Khandelwal, Sumit Singh, Parth Halani, Suprit Acharya
8 min read
Has Summary
--
The article discusses the implementation of a Policy Simulator at Uber to enhance the safety and determinism of Identity and Access Management (IAM) policy changes.
Avinash Srivenkatesh, Zi Wen, Zakir Akram
15 min read
Has Summary
--
This article discusses the architecture and implementation of Uber's HiveSync, a critical service for data replication across its massive data lake.
Radhika Patwari, Trivedhi Talakola, Rajan Jaiswal, Chayanika Bhandary, Mukesh Verma, Sanjay Sundaresan
14 min read
Has Summary
--
This article discusses the development and implementation of forecasting models aimed at improving driver availability at airports, which are critical to Uber's ridesharing ecosystem.
Bob Zheng, Dhruv Ghulati, Manoj Panikkar, Michael (Yichuan) Cai
15 min read
Has Summary
--
The article discusses Uber's implementation of encryption at rest and disk isolation at scale using their Stateful Platform, Odin.
Ivan Shibitov, Johan Abildskov
14 min read
Has Summary
--
uReview is an AI code review platform developed by Uber to enhance the code review process by providing timely, high-quality feedback.
The article discusses how Uber processes early chargeback signals to mitigate payment fraud and enhance customer trust.
Avadhut Thakar
7 min read
Has Summary
--
The article discusses the evolution of Uber's Search Platform, highlighting its transition from Elasticsearch to an in-house solution called Sia, and ultimately to the adoption of OpenSearch.
Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen
15 min read
Has Summary
--
The article discusses Uber's automation of Kerberos keytab rotation, detailing the challenges faced and the solutions implemented through their Keytab Distribution Pipeline (KDP).
Junyan Guo, Matt Mathew
14 min read
Has Summary
--
This article details Uber's migration from Apache Hive to Apache Spark SQL for ETL workloads, highlighting the motivations behind the transition, the architecture involved, and the challenges faced...
Kumudini Kakwani, Akshayaprakash Sharma, Nimesh Khandelwal, Aayush Chaturvedi, Chintan Betrabet, Suprit Acharya
14 min read
Has Summary
--
The article discusses Uber's implementation of a configuration-driven archival and retrieval framework designed to manage vast amounts of regulatory data efficiently.
The article discusses Uber's robust database backup recovery system, highlighting its importance for business continuity and disaster recovery.
This article discusses the development of Uber's Multi-Cloud Secrets Management Platform, designed to enhance security across its extensive microservices architecture.
Matt Mathew, Ludi Li, Chen Xi, Yiting Fan
16 min read
Has Summary
--
The article discusses Uber's migration of large-scale interactive compute workloads from Peloton to Kubernetes, focusing on minimizing disruption while enhancing resource management and cloud readi...
Sayan Pal, Rishabh Mishra
12 min read
Has Summary
--
The article discusses Uber's migration from Apache Mesos to Kubernetes, detailing the motivations, challenges, and solutions encountered during the transition.
Aditya Bhave, Arun Krishnan
11 min read
Has Summary
--
This article discusses Uber's implementation of elastic resource management on Kubernetes, focusing on enhancements made to support Ray-based job management.
Bharat Joshi, Anant Vyas, Ben Wang, Axansh Sheth, Abhinav Dixit
10 min read
Has Summary
--
Uber's blog post discusses their migration of machine learning workloads to Kubernetes using Ray, detailing the challenges faced with their previous setup and the improvements achieved with the new...
Bharat Joshi, Anant Vyas, Ben Wang, Min Cai, Axansh Sheth, Abhinav Dixit
18 min read
Has Summary
--
The article discusses Uber's transition to a multi-architecture environment by adopting Arm-based hosts at scale.
The article discusses the MySQL fleet at Uber, which consists of over 2,300 independent clusters that support critical operations for the platform.
Banty Kumar, Debadarsini Nayak, Raja Sriram Ganesan, Amit Jain
15 min read
Has Summary
--
The article discusses how Uber utilizes Ray®, a general compute engine for Python®, to enhance the efficiency of its rides business through improved machine learning model performance and optimizat...
Kaichen Wei, Matt Walker, Peng Zhang
15 min read
Has Summary
--
The article discusses how Uber leverages Neutrino, an internal fork of Presto, to efficiently serve millions of queries to Apache Pinot, a real-time OLAP database.
The article discusses The Accounter, a global coordination system developed by Uber to enhance operational throughput and safety on its stateful platform, Odin.
Jesper Borlum, Gianluca Mezzetti, Alexander Blazhenskikh
14 min read
Has Summary
--
The article discusses Uber's implementation of Presto Express, an enhancement to the Presto SQL query engine aimed at reducing the end-to-end Service Level Agreement (SLA) for short-running queries.
The article discusses recent developments in Apache Pinot that enable infinite retention for upsert tables, focusing on the implementation of deletions at both memory and disk levels.
Pratik Tibrewal
10 min read
Has Summary
--
The article discusses Uber's advanced settlement accounting system, which is crucial for managing financial transactions involving payment service providers (PSPs).
Onkar Singh, Sai Sameera Grandhi, Nagesh Kumar Mankala, Abhinav Agarwal
12 min read
Has Summary
--
The article discusses how Uber optimizes the training of Large Language Models (LLMs) using both open-source and in-house models.
ApacheApache KafkaApache SparkCometDockerGoogle CloudGPTGPT-4Hugging FaceKubernetesMistralPyTorchSQLTransformers
Bo Ling, Jiapei Huang, Baojun Liu, Chongxiao Cao, Anant Vyas, Peng Zhang
11 min read
Has Summary
--
The article discusses Genie, Uber's generative AI on-call copilot designed to enhance communication and efficiency in on-call operations.
Paarth Chothani, Eduards Sidorovics, Xiyuan Feng, Nicholas Marcott, Jonathan Li, Chun Zhu, Kailiang Fu, Meghana Somasundara
11 min read
Has Summary
--
This article discusses the significant improvements made to Uber's Experiment Evaluation Engine, achieving a 100x reduction in latency by transitioning from a remote evaluation architecture to a lo...
Akshay Jetli, Deepak Bobbarjung, Sergey Gitlin, Andy Maule
15 min read
Has Summary
--
The article discusses Preon, a microservice developed by Uber for intelligent and efficient query analysis using the Presto SQL engine.
Gurmeet Singh
13 min read
Has Summary
--
The article discusses Uber's migration of its batch data platform to the cloud, focusing on the implementation of DataMesh principles.
Arun Mahadeva Iyer, Abhi Khune, Sahana Bhat
11 min read
Has Summary
--
The article discusses Uber's upgrade of its search platform from Lucene version 7. 5. 0 to 9. 4.
Anand Kotriwal, Aparajita Pandey, Charu Jain, Yupeng Fu
12 min read
Has Summary
--
The article discusses how Uber utilizes Apache Pinot for low-latency offline table analytics, highlighting its capabilities in handling various use cases, including real-time and offline data inges...
Ankit Sultana, Caner Balci
15 min read
Has Summary
--
The article discusses Uber's strategy for shifting end-to-end (E2E) testing left in their development process to improve efficiency and reduce operational costs.
Quess Liu, Daniel Tsui
11 min read
Has Summary
--
The article discusses the Sparkle framework developed by Uber to standardize modular ETL processes, enhancing developer productivity and data quality.
Dinesh Jagannathan, Sharath Bhat, Suman Voleti, Praveen Raj
8 min read
Has Summary
--
This article discusses Uber's migration of its Apache Hadoop-based data lake to Google Cloud Storage (GCS) and the security measures implemented during this transition.
Matt Mathew, Alexander Gulko, Lei Sun, KK Sriramadhesikan, Alan Cao, Omkar Kakade
20 min read
Includes Code
Has Summary
--
Odin is Uber's stateful platform designed to manage various technologies for data storage efficiently.
Jesper Borlum, Gianluca Mezzetti
14 min read
Has Summary
--
This article introduces Kafka Tiered Storage at Uber, detailing its architecture and the motivation behind its implementation.
Satish Duggana, Kamal Chandraprakash, Abhijeet Kumar
9 min read
Has Summary
--