Uber logo

How Uber Uses SQL

77 engineering articles about SQL from Uber's engineering team

Articles

Filter:
Uber logo
Uber
Advanced
This article discusses the improvements made to MySQL cluster uptime at Uber through the implementation of MySQL Group Replication (MGR).
Siddharth Singh, Raja Sriram Ganesan, Amit Jain, Debadarsini Nayak
10 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's implementation of I/O observability for its massive petabyte-scale data lake, focusing on the challenges and solutions in monitoring data access patterns across its hyb...
Arnav Balyan, Kartik Bommepally, Amruth Sampath, Jing Zhao, Akshayaprakash Sharma
10 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses the rebuilding of Uber's Apache Pinot™ query architecture, focusing on the transition from Neutrino to a new query system that utilizes Pinot's Multi-Stage Engine Lite Mode.
Ankit Sultana, Christina Li, Shaurya Chaturvedi, Tarun Mavani, Shreyaa Sharma
11 min read
Has Summary
--
Uber logo
Uber
Advanced
Uber's migration from Spark 2. 4 to Spark 3. 3 involved upgrading over 2 million Spark applications, utilizing innovative automation tools like Iron Dome.
Amruth Sampath, Arnav Balyan, Nimesh Khandelwal, Sumit Singh, Parth Halani, Suprit Acharya
8 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Finch, Uber's conversational AI data agent designed to streamline financial data retrieval within the Slack environment.
Austin Harrison, Eddie Huang, Spencer Garth, Tim Ross, Taya Yusuf
13 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the evolution of Uber's Search Platform, highlighting its transition from Elasticsearch to an in-house solution called Sia, and ultimately to the adoption of OpenSearch.
Yupeng Fu, Shubham Gupta, Shanshan Song, Mingmin Chen
15 min read
Has Summary
--
Uber logo
Uber
Intermediate
This article details Uber's migration from Apache Hive to Apache Spark SQL for ETL workloads, highlighting the motivations behind the transition, the architecture involved, and the challenges faced...
Kumudini Kakwani, Akshayaprakash Sharma, Nimesh Khandelwal, Aayush Chaturvedi, Chintan Betrabet, Suprit Acharya
14 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the MySQL fleet at Uber, which consists of over 2,300 independent clusters that support critical operations for the platform.
Banty Kumar, Debadarsini Nayak, Raja Sriram Ganesan, Amit Jain
15 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses how Uber leverages Neutrino, an internal fork of Presto, to efficiently serve millions of queries to Apache Pinot, a real-time OLAP database.
Ankit Sultana, Pratik Tibrewal, Christina Li, Shreyaa Sharma, Ujwala Tulshigiri
12 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's implementation of Presto Express, an enhancement to the Presto SQL query engine aimed at reducing the end-to-end Service Level Agreement (SLA) for short-running queries.
Mingjia Hang, Gurmeet Singh
10 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses how Uber optimizes the training of Large Language Models (LLMs) using both open-source and in-house models.
Bo Ling, Jiapei Huang, Baojun Liu, Chongxiao Cao, Anant Vyas, Peng Zhang
11 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Preon, a microservice developed by Uber for intelligent and efficient query analysis using the Presto SQL engine.
Gurmeet Singh
13 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses QueryGPT, a tool developed by Uber that converts natural language prompts into SQL queries using generative AI.
Jeffrey Johnson, Callie Busch, Abhi Khune, Pradeep Chakka
14 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses the Sparkle framework developed by Uber to standardize modular ETL processes, enhancing developer productivity and data quality.
Dinesh Jagannathan, Sharath Bhat, Suman Voleti, Praveen Raj
8 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's upgrade of its MySQL fleet from version 5. 7 to 8. 0, detailing the motivations, challenges, and solutions implemented during the process.
Siddharth Singh, Sriram Rao Udupi, Raja Sriram Ganesan, Debadarsini Nayak
12 min read
Has Summary
--
Uber logo
Uber
Advanced
Uber is modernizing its batch data infrastructure by migrating to Google Cloud Platform (GCP) to enhance data analytics and machine learning capabilities.
Abhi Khune, Arun Mahadeva Iyer, Sahana Bhat, Matt Mathew
7 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's journey in enhancing its Palette Meta Store, focusing on the challenges faced, the solutions implemented, and the resulting improvements in machine learning feature man...
Paarth Chothani, Nicholas Marcott, Dehua Lai, Xiyuan Feng, Chunhao Zhang, Victoria Wu
10 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the Cinnamon Auto-Tuner, a system designed to adaptively manage concurrency in production environments.
Vladimir Gavrilenko, Jakob Holdgaard Thomsen, Jesper Lindstrom Nielsen, Timothy Smyth
19 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses CheckEnv, a tool developed by Uber for fast detection of remote procedure calls (RPCs) between different environments using graph technology.
Minglei Wang, Kamyar Arbabifard
11 min read
Has Summary
--
Uber logo
Uber
Intermediate
This article discusses Uber's experience migrating a large-scale invoice generation service from a legacy system to a new service called Invoicer.
Georgi Zhuhov, Irina Kurteva, Iskren Dimov, Nikolay Lazarov, Plamena Todorova, Yordan Petrov
11 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses the evolution of Data Lifecycle Management (DLM) at Uber, detailing the journey from initial implementations to the development of a unified system.
Sumanth Srinivasa Krishnaswamy, Matt Mathew, Sonali Goyal
13 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Spark Analysers, a system developed by Uber to identify anti-patterns in Spark applications.
Vijayant Soni, Sashidhar Thallam, Sakshi Pande, Atul Mantri
10 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses how Uber implemented an incremental ETL process using Apache Hudi to manage its transactional data lake.
Vinoth Govindarajan, Saketh Chintapalli, Yogesh Saswade, Aayush Bareja
16 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses D3, an automated system developed by Uber to detect data drifts in datasets, which is crucial for maintaining data quality and ensuring the performance of machine learning mod...
Anshal Shukla, Vineeth Tatipathri, Nipun Vats, Dinesh Jagannathan, Kousik Nath
19 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses how Uber reduced its logging costs significantly by integrating the Compressed Log Processor (CLP) into its logging architecture.
Jack (Yu) Luo, Devesh Agrawal
22 min read
Has Summary
--
Uber logo
Uber
Intermediate
This article discusses Uber's migration from MySQL to MyRocks, a storage engine that integrates with RocksDB, to address disk space bottlenecks and improve operational efficiency.
Shriniket Kale, Hao Xu, Shenglin Du
9 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses how Uber integrates Presto® and Apache Kafka® to enhance its big data analytics capabilities.
Yang Yang, Yupeng Fu, Hitarth Trivedi
10 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses the implementation of finer-grained encryption in Apache Parquet™, focusing on how it addresses data access restrictions, retention, and encryption at rest.
Xinli Shang, Mohammad Islam, Pavi Subenderan, Jianchun Xu
19 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Project RADAR, an intelligent fraud detection system developed by Uber that integrates machine learning and human expertise to identify and mitigate fraudulent activities in r...
Sergey Zelvenskiy, Garvit Harisinghani, Tiffany Yu, Edwin Ng, Robin Wei
14 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses the development of Uber's Fulfillment Platform using Google Cloud Spanner, focusing on its architecture, scalability, and operational efficiency.
Ankit Srivastava, Fabin Jose, Jean He, Nandakumar Gopalakrishnan, [email protected], Ramachandran Iyer, Uday Kiran Medisetty
20 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses Uber's implementation of a real-time exactly-once ad event processing system using open-source technologies such as Apache Flink, Kafka, and Pinot.
Jacob Tsafatinos, Yuriy Bondaruk, Yupeng Fu, James Kwon
12 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's initiatives to enhance the efficiency of its Big Data platform, focusing on cost reduction through optimizations in file formats, HDFS erasure coding, YARN scheduling i...
Zheng Shao, Mohammad Islam
18 min read
Has Summary
--
Uber logo
Uber
Advanced
Uber’s Finance Computation Platform (FCP) is designed to handle the scale and complexity of financial transactions across its various services.
Shashank Agarwal
14 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Uber's comprehensive re-architecture of its Fulfillment Platform, aimed at enhancing its Go/Get strategy.
Ashwin Neerabail, Ankit Srivastava, Kamran Massoudi, Madan Thangavelu, Uday Kiran Medisetty
19 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Uber's 'Orders Near You' feature, which utilizes real-time geospatial data analytics to enhance user experience in the Uber Eats app.
Yupeng Fu, Cassandra Tomazic, Dharak Kharod
10 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses the development and implementation of Charon, a real-time analytics framework at Uber designed for automating merchant live monitoring.
Marco Vita, Ujwala Tulshigiri, Dharak Kharod
12 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's journey towards establishing a better data culture by addressing critical data issues and implementing a holistic approach to data management.
Krishna Puttaswamy, Suresh Srinivas
19 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the application of machine learning in internal auditing, specifically focusing on the challenges and methodologies used at Uber to analyze sparsely labeled data.
Jesse He
11 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the evolution of Uber's Schemaless datastore into a distributed SQL database called Docstore, highlighting its features, architecture, and motivation behind the transition.
Ovais Tariq, Deba Chatterjee, Himank Chaudhary
9 min read
Has Summary
--
Uber logo
Uber
Advanced
Uber has developed a centralized, schema-agnostic log analytics platform that enhances logging efficiency and reliability.
Chao Wang, Xiaobing Li
20 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's journey towards metric standardization through the development of uMetric, a unified internal metric platform.
Xiaodong Wang, Wenrui Meng, Will Yu, Yun Wu
13 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Uber's development of uWorc, a no-code workflow orchestrator designed to simplify the creation of batch and streaming data pipelines.
Sandeep Karmakar, Sriharsha Chintalapani
11 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's Databook, an in-house platform designed to manage and surface metadata related to various data entities.
Sunheng Taing, Atul Gupte
25 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses Uber's experience operating Apache Pinot at scale, detailing its role in enabling real-time analytics across various use cases.
Yupeng Fu, Girish Baliga, Ting Chen, Chinmay Soman
23 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article introduces Athenadriver, an open-source database driver for Amazon Athena designed for Go, which facilitates seamless integration between Uber's business intelligence tools and AWS Athe...
Henry Fuheng Wu, Raymond Won, Nick Cobb, Mingjie Lai, Matt Ranney
8 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses the optimization of JVM memory and garbage collection (GC) for large-scale services at Uber, focusing on the challenges and solutions implemented to enhance performance and r...
Xinli Shang, Yi Zhang, Fengnan Li, Amruth Sampath, Girish Baliga
29 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses how Uber engineered SQL support on Apache Pinot, enhancing real-time analytics capabilities for their Big Data stack.
Haibo Wang
16 min read
Has Summary
--
Uber logo
Uber
Intermediate
The article discusses Uber's open source initiatives in 2019, highlighting the company's contributions to the open source community, the establishment of the Open Source Program Office (OSPO), and ...
Uber logo
Uber
Advanced
The article discusses Uber's advancements in data infrastructure during 2019, focusing on how data science was leveraged to optimize performance and manage vast amounts of data.
Nikhil Joshi, Viv Keswani
6 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the evolution of the Michelangelo model representation at Uber to enhance flexibility and scalability in machine learning model serving.