How Uber Uses Apache

195 engineering articles about Apache from Uber's engineering team

Other Uber Technologies

Java(112)Apache Spark(94)MySQL(78)SQL(77)JSON(74)Apache Kafka(68)

Other Companies Using Apache

Articles

Filter:

Uber

Intermediate

Introducing uFowarder: The Consumer Proxy for Kafka Async Queuing

This article introduces uForwarder, Uber's open-source push-based consumer proxy for Apache Kafka's async queuing system.

ApacheApache KafkagRPC

Zhifeng Chen, Yang Yang, Haifeng Chen

12 min read

Has Summary

Uber

Intermediate

How Uber Scaled Data Replication to Move Petabytes Every Day

This article details how Uber optimized Apache Hadoop's Distcp (Distributed Copy) tool to scale their data replication infrastructure from handling 250 TB to petabytes of daily data movement.

ApacheApache Spark

Abhay Yadav, Radhika Patwari, Sanjay Sundaresan

15 min read

Has Summary

Uber

Advanced

Apache Hudi™ at Uber: Engineering for Trillion-Record-Scale Data Lake Operations

This article details how Uber built and scaled Apache Hudi to power one of the world's largest data lakes, managing 19,500 datasets with trillions of records across a multi-hundred-petabyte reposit...

ApacheApache SparkAWSAzureGoogle CloudGoogle Cloud Storage

Prashant Wason, Balajee Nagasubramaniam, Surya Prasanna Kumar Yalla, Meenal Binwade, Xinli Shang, Jack Song

19 min read

Has Summary

Uber

Advanced

Powering Billion-Scale Vector Search with OpenSearch

The article discusses Uber's transition from traditional keyword-based search using Apache Lucene to implementing semantic vector search with Amazon OpenSearch.

ApacheApache SparkCSSEmbedding

Hao Sun, Jiasen Xu, Smit Patel, Anand Kotriwal, Xu Zhang

11 min read

Has Summary

Uber

Advanced

How Uber Indexes Streaming Data with Pull-Based Ingestion in OpenSearch™

This article discusses how Uber utilizes a pull-based ingestion model in OpenSearch™ to effectively index streaming data.

ApacheApache KafkaApache SparkAWSgRPC

Yupeng Fu, Varun Bharadwaj, Shuyi Zhang, Xu Xiong, Michael Froh

14 min read

Has Summary

Uber

Advanced

From Batch to Streaming: Accelerating Data Freshness in Uber’s Data Lake

This article discusses Uber's transition from batch to streaming data ingestion using Apache Flink, which significantly enhances data freshness and operational efficiency.

ApacheApache KafkaApache SparkMachine Learning

Xinli Shang, Peter Huang, Jing Li, Jing Zhao, Jack Song

6 min read

Has Summary

Uber

Intermediate

Blazing Fast OLAP on Uber’s Inventory and Catalog Data with Apache Pinot™

This article discusses Uber's implementation of Apache Pinot to manage and analyze its extensive inventory and catalog data efficiently.

ApacheApache KafkaJavaMySQLOracle

Suraj Modi, Ankit Sultana, Tarun Mavani

11 min read

Has Summary

Uber

Advanced

Evolution and Scale of Uber’s Delivery Search Platform

The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experien...

ApacheEmbeddingHugging FacePyTorchTransformers

Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen

11 min read

Has Summary

Uber

Advanced

I/O Observability for Uber’s Massive Petabyte-Scale Data Lake

The article discusses Uber's implementation of I/O observability for its massive petabyte-scale data lake, focusing on the challenges and solutions in monitoring data access patterns across its hyb...

ApacheApache SparkGoogle CloudGoogle Cloud StorageGrafanaJavaMySQLOracleSQL

Arnav Balyan, Kartik Bommepally, Amruth Sampath, Jing Zhao, Akshayaprakash Sharma

10 min read

Has Summary

Uber

Advanced

Building Zone Failure Resilience in Apache Pinot™ at Uber

This article discusses the implementation of zone failure resilience in Apache Pinot at Uber, detailing strategies to ensure uninterrupted service during zone failures.

ApacheApache KafkaGrafanaKubernetes

Si Lao, Christina Li, Xuanyi Li, Yang Yang, Ujwala Tulshigiri

10 min read

Has Summary

Uber

Intermediate

Raising the Bar on ML Model Deployment Safety

The article discusses Uber's approach to enhancing the safety of machine learning (ML) model deployments through a series of mechanisms integrated into their ML life cycle.

ApacheGenerative AI

Sophie Wang, Jia Li, Joseph Wang

10 min read

Has Summary

Uber

Advanced

Rebuilding Uber’s Apache Pinot™ Query Architecture

This article discusses the rebuilding of Uber's Apache Pinot™ query architecture, focusing on the transition from Neutrino to a new query system that utilizes Pinot's Multi-Stage Engine Lite Mode.

ApacheGrafanaJavaSQL

Ankit Sultana, Christina Li, Shaurya Chaturvedi, Tarun Mavani, Shreyaa Sharma

11 min read

Has Summary

Uber

Advanced

How Uber Standardized Mobile Analytics for Cross-Platform Insights

This article details how Uber standardized its mobile analytics system to improve data consistency and quality across its applications.

ApacheKotlinSwiftThrift

Ben Hjerrild, Rajat Sharma, Shawn Dong, Wugang Zhao

12 min read

Has Summary

Uber

Advanced

Uber’s Strategy to Upgrading 2M+ Spark Jobs

Uber's migration from Spark 2. 4 to Spark 3. 3 involved upgrading over 2 million Spark applications, utilizing innovative automation tools like Iron Dome.

ApacheApache SparkJavaKubernetesMySQLOraclePySparkPythonScalaSQL

Amruth Sampath, Arnav Balyan, Nimesh Khandelwal, Sumit Singh, Parth Halani, Suprit Acharya

8 min read

Has Summary

Uber

Intermediate

Adding Determinism and Safety to Uber IAM Policy Changes

The article discusses the implementation of a Policy Simulator at Uber to enhance the safety and determinism of Identity and Access Management (IAM) policy changes.

ApacheApache KafkaAWSEnvoyGoogle Cloud

Avinash Srivenkatesh, Zi Wen, Zakir Akram

15 min read

Has Summary

Uber

Advanced

Building Uber’s Data Lake: Batch Data Replication Using HiveSync

This article discusses the architecture and implementation of Uber's HiveSync, a critical service for data replication across its massive data lake.

ApacheApache SparkGoogle CloudJavaMySQLOracle

Radhika Patwari, Trivedhi Talakola, Rajan Jaiswal, Chayanika Bhandary, Mukesh Verma, Sanjay Sundaresan

14 min read

Has Summary

Uber

Advanced

Forecasting Models to Improve Driver Availability at Airports

This article discusses the development and implementation of forecasting models aimed at improving driver availability at airports, which are critical to Uber's ridesharing ecosystem.

ApacheApache SparkCassandraKongTransformerTransformers

Bob Zheng, Dhruv Ghulati, Manoj Panikkar, Michael (Yichuan) Cai

15 min read

Has Summary

Uber

Advanced

Locking Down the Fleet: Encryption at Rest and Disk Isolation at Scale

The article discusses Uber's implementation of encryption at rest and disk isolation at scale using their Stateful Platform, Odin.

ApacheCassandraElasticsearchKubernetesMySQLOracleRedis

Ivan Shibitov, Johan Abildskov

14 min read

Has Summary

Uber

Advanced

uReview: Scalable, Trustworthy GenAI for Code Review at Uber

uReview is an AI code review platform developed by Uber to enhance the code review process by providing timely, high-quality feedback.

ApacheApache KafkaClaudeCopilotGPTGPT-4JavaPython

Sonal Mahajan, Shauvik Roy Choudhary, Akshay Utture, Will Bond, Joseph Wang

14 min read

Has Summary

Uber

Intermediate

How Uber Processes Early Chargeback Signals

The article discusses how Uber processes early chargeback signals to mitigate payment fraud and enhance customer trust.

ApacheApache KafkaRedis

Avadhut Thakar

7 min read

Has Summary

Uber

Advanced

The Evolution of Uber’s Search Platform

The article discusses the evolution of Uber's Search Platform, highlighting its transition from Elasticsearch to an in-house solution called Sia, and ultimately to the adoption of OpenSearch.

ApacheApache KafkaApache SparkAWSElasticsearchGoogle CloudGoogle Cloud StoragegRPCJSONSQL

Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen

15 min read

Has Summary

Uber

Intermediate

Automating Kerberos Keytab Rotation at Uber

The article discusses Uber's automation of Kerberos keytab rotation, detailing the challenges faced and the solutions implemented through their Keytab Distribution Pipeline (KDP).

ApacheKubernetes

Junyan Guo, Matt Mathew

14 min read

Has Summary

Uber

Intermediate

How Uber Migrated from Hive to Spark SQL for ETL Workloads

This article details Uber's migration from Apache Hive to Apache Spark SQL for ETL workloads, highlighting the motivations behind the transition, the architecture involved, and the challenges faced...

ApacheApache SparkJavaJSONMySQLOracleServerlessSQL

Kumudini Kakwani, Akshayaprakash Sharma, Nimesh Khandelwal, Aayush Chaturvedi, Chintan Betrabet, Suprit Acharya

14 min read

Has Summary

Uber

Intermediate

From Archival to Access: Config-Driven Data Pipelines

The article discusses Uber's implementation of a configuration-driven archival and retrieval framework designed to manage vast amounts of regulatory data efficiently.

ApacheApache SparkAWSMySQLOracleYAML

Abhishek Dobliyal, Aakash Bhardwaj

12 min read

Has Summary

Uber

Advanced

Robust Database Backup Recovery at Uber

The article discusses Uber's robust database backup recovery system, highlighting its importance for business continuity and disaster recovery.

ApacheCassandraMySQLOracle

Arjav Jain, Shivam Vijay, Debadarsini Nayak, Mohammed Khatib, Ramnik Jain

11 min read

Has Summary

Uber

Advanced

Building Uber’s Multi-Cloud Secrets Management Platform to Enhance Security

This article discusses the development of Uber's Multi-Cloud Secrets Management Platform, designed to enhance security across its extensive microservices architecture.

ApacheGitGoogle CloudKubernetesOAuthOracleVault

Matt Mathew, Ludi Li, Chen Xi, Yiting Fan

16 min read

Has Summary

Uber

Intermediate

Migrating Large-Scale Interactive Compute Workloads to Kubernetes Without Disruption

The article discusses Uber's migration of large-scale interactive compute workloads from Peloton to Kubernetes, focusing on minimizing disruption while enhancing resource management and cloud readi...

ApacheApache SparkCassandraDockerGoogle CloudKubernetesPySpark

Sayan Pal, Rishabh Mishra

12 min read

Has Summary

Uber

Advanced

Migrating Uber’s Compute Platform to Kubernetes: A Technical Journey

The article discusses Uber's migration from Apache Mesos to Kubernetes, detailing the motivations, challenges, and solutions encountered during the transition.

ApacheGoogle CloudJSONKubernetesOracle

Aditya Bhave, Arun Krishnan

11 min read

Has Summary

Uber

Intermediate

Uber’s Journey to Ray on Kubernetes: Resource Management

This article discusses Uber's implementation of elastic resource management on Kubernetes, focusing on enhancements made to support Ray-based job management.

ApacheApache SparkGrafanaKubernetes

Bharat Joshi, Anant Vyas, Ben Wang, Axansh Sheth, Abhinav Dixit

10 min read

Has Summary

Uber

Intermediate

Uber’s Journey to Ray on Kubernetes: Ray Setup

Uber's blog post discusses their migration of machine learning workloads to Kubernetes using Ray, detailing the challenges faced with their previous setup and the improvements achieved with the new...

ApacheApache SparkDeep LearningGrafanaKubernetes

Bharat Joshi, Anant Vyas, Ben Wang, Min Cai, Axansh Sheth, Abhinav Dixit

18 min read

Has Summary

Uber

Advanced

Adopting Arm at Scale: Transitioning to a Multi-Architecture Environment

The article discusses Uber's transition to a multi-architecture environment by adopting Arm-based hosts at scale.

ApacheCassandraGoogle CloudJavaMySQLOracleRedisZig

Andreas Lykke, Jesper Borlum

10 min read

Has Summary

Uber

Advanced

MySQL At Uber

The article discusses the MySQL fleet at Uber, which consists of over 2,300 independent clusters that support critical operations for the platform.

ApacheApache KafkaDockerGitKubernetesMySQLOracleSQL

Banty Kumar, Debadarsini Nayak, Raja Sriram Ganesan, Amit Jain

15 min read

Has Summary

Uber

Advanced

How Uber Uses Ray® to Optimize the Rides Business

The article discusses how Uber utilizes Ray®, a general compute engine for Python®, to enhance the efficiency of its rides business through improved machine learning model performance and optimizat...

ApacheApache SparkAWSDockerKubernetesPandasPySparkXGBoost

Kaichen Wei, Matt Walker, Peng Zhang

15 min read

Has Summary

Uber

Advanced

Serving Millions of Apache Pinot™ Queries with Neutrino

The article discusses how Uber leverages Neutrino, an internal fork of Presto, to efficiently serve millions of queries to Apache Pinot, a real-time OLAP database.

ApacheJavaMySQLOracleRate LimitingSQL

Ankit Sultana, Pratik Tibrewal, Christina Li, Shreyaa Sharma, Ujwala Tulshigiri

12 min read

Has Summary

Uber

Advanced

The Accounter: Scaling Operational Throughput on Uber’s Stateful Platform

The article discusses The Accounter, a global coordination system developed by Uber to enhance operational throughput and safety on its stateful platform, Odin.

ApacheCassandraKubernetes

Jesper Borlum, Gianluca Mezzetti, Alexander Blazhenskikh

14 min read

Has Summary

Uber

Advanced

Presto® Express: Speeding up Query Processing with Minimal Resources

The article discusses Uber's implementation of Presto Express, an enhancement to the Presto SQL query engine aimed at reducing the end-to-end Service Level Agreement (SLA) for short-running queries.

ApacheApache KafkaJavaMySQLOracleSQL

Mingjia Hang, Gurmeet Singh

10 min read

Has Summary

Uber

Advanced

Enabling Infinite Retention for Upsert Tables in Apache Pinot

The article discusses recent developments in Apache Pinot that enable infinite retention for upsert tables, focusing on the implementation of deletions at both memory and disk levels.

Apache

Pratik Tibrewal

10 min read

Has Summary

Uber

Advanced

Streamlining Financial Precision: Uber’s Advanced Settlement Accounting System

The article discusses Uber's advanced settlement accounting system, which is crucial for managing financial transactions involving payment service providers (PSPs).

ApacheApache KafkaApache SparkCassandra

Onkar Singh, Sai Sameera Grandhi, Nagesh Kumar Mankala, Abhinav Agarwal

12 min read

Has Summary

Uber

Advanced

Open Source and In-House: How Uber Optimizes LLM Training

The article discusses how Uber optimizes the training of Large Language Models (LLMs) using both open-source and in-house models.

ApacheApache KafkaApache SparkCometDockerGoogle CloudGPTGPT-4Hugging FaceKubernetesMistralPyTorchSQLTransformers

Bo Ling, Jiapei Huang, Baojun Liu, Chongxiao Cao, Anant Vyas, Peng Zhang

11 min read

Has Summary

Uber

Intermediate

Genie: Uber’s Gen AI On-Call Copilot

The article discusses Genie, Uber's generative AI on-call copilot designed to enhance communication and efficiency in on-call operations.

ApacheApache SparkCopilotEmbeddingFine-tuningPySpark

Paarth Chothani, Eduards Sidorovics, Xiyuan Feng, Nicholas Marcott, Jonathan Li, Chun Zhu, Kailiang Fu, Meghana Somasundara

11 min read

Has Summary

Uber

Intermediate

Making Uber’s ExperimentEvaluation Engine 100x Faster

This article discusses the significant improvements made to Uber's Experiment Evaluation Engine, achieving a 100x reduction in latency by transitioning from a remote evaluation architecture to a lo...

ApacheApache KafkaJava

Akshay Jetli, Deepak Bobbarjung, Sergey Gitlin, Andy Maule

15 min read

Has Summary

Uber

Intermediate

Preon: Presto Query Analysis for Intelligent and Efficient Analytics

The article discusses Preon, a microservice developed by Uber for intelligent and efficient query analysis using the Presto SQL engine.

ApacheCachingMicroservicesRedisSQL

Gurmeet Singh

13 min read

Has Summary

Uber

Intermediate

DataMesh: How Uber laid the foundations for the data lake cloud migration

The article discusses Uber's migration of its batch data platform to the cloud, focusing on the implementation of DataMesh principles.

ApacheApache SparkGoogle CloudGoogle Cloud StorageGrafanaJavaMySQLOracle

Arun Mahadeva Iyer, Abhi Khune, Sahana Bhat

11 min read

Has Summary

Uber

Advanced

Lucene: Uber’s Search Platform Version Upgrade

The article discusses Uber's upgrade of its search platform from Lucene version 7. 5. 0 to 9. 4.

ApacheApache SparkGitJavaScala

Anand Kotriwal, Aparajita Pandey, Charu Jain, Yupeng Fu

12 min read

Has Summary

Uber

Intermediate

Pinot for Low-Latency Offline Table Analytics

The article discusses how Uber utilizes Apache Pinot for low-latency offline table analytics, highlighting its capabilities in handling various use cases, including real-time and offline data inges...

ApacheApache KafkaApache SparkgRPCJavaMySQLOraclePySparkScala

Ankit Sultana, Caner Balci

15 min read

Has Summary

Uber

Advanced

Shifting E2E Testing Left at Uber

The article discusses Uber's strategy for shifting end-to-end (E2E) testing left in their development process to improve efficiency and reduce operational costs.

ApacheApache KafkaDockergRPC

Quess Liu, Daniel Tsui

11 min read

Has Summary

Uber

Intermediate

Sparkle: Standardizing Modular ETL at Uber

The article discusses the Sparkle framework developed by Uber to standardize modular ETL processes, enhancing developer productivity and data quality.

ApacheApache KafkaApache SparkCassandraJavaMySQLOracleScalaSpringSpring BootSQLYAML

Dinesh Jagannathan, Sharath Bhat, Suman Voleti, Praveen Raj

8 min read

Has Summary

Uber

Advanced

Enabling Security for Hadoop Data Lake on Google Cloud Storage

This article discusses Uber's migration of its Apache Hadoop-based data lake to Google Cloud Storage (GCS) and the security measures implemented during this transition.

ApacheApache SparkCachingCQRSGoogle CloudGoogle Cloud StoragegRPCMVPOAuthRedis

Matt Mathew, Alexander Gulko, Lei Sun, KK Sriramadhesikan, Alan Cao, Omkar Kakade

20 min read

Includes Code

Has Summary

Uber

Advanced

Odin: Uber’s Stateful Platform

Odin is Uber's stateful platform designed to manage various technologies for data storage efficiently.

ApacheApache KafkaCassandraKubernetesMySQLOracle

Jesper Borlum, Gianluca Mezzetti

14 min read