How Uber Uses PySpark
18 engineering articles about PySpark from Uber's engineering team
Other Uber Technologies
Other Companies Using PySpark
Articles
Filter:
Uber's migration from Spark 2. 4 to Spark 3. 3 involved upgrading over 2 million Spark applications, utilizing innovative automation tools like Iron Dome.
Amruth Sampath, Arnav Balyan, Nimesh Khandelwal, Sumit Singh, Parth Halani, Suprit Acharya
8 min read
Has Summary
--
The article discusses Uber's migration of large-scale interactive compute workloads from Peloton to Kubernetes, focusing on minimizing disruption while enhancing resource management and cloud readi...
Sayan Pal, Rishabh Mishra
12 min read
Has Summary
--
The article discusses how Uber utilizes Ray®, a general compute engine for Python®, to enhance the efficiency of its rides business through improved machine learning model performance and optimizat...
Kaichen Wei, Matt Walker, Peng Zhang
15 min read
Has Summary
--
The article discusses Genie, Uber's generative AI on-call copilot designed to enhance communication and efficiency in on-call operations.
Paarth Chothani, Eduards Sidorovics, Xiyuan Feng, Nicholas Marcott, Jonathan Li, Chun Zhu, Kailiang Fu, Meghana Somasundara
11 min read
Has Summary
--
The article discusses how Uber utilizes Apache Pinot for low-latency offline table analytics, highlighting its capabilities in handling various use cases, including real-time and offline data inges...
Ankit Sultana, Caner Balci
15 min read
Has Summary
--
The article discusses the implementation of Two-Tower Embeddings (TTE) at Uber, highlighting its role in enhancing the efficiency and scalability of recommendation systems.
Bo Ling, Melissa Barr, Dhruva Dixith Kurra, Chun Zhu, Nicholas Marcott
18 min read
Has Summary
--
The article discusses Uber's approach to automating offline inferences using machine learning and natural language processing on support interaction data.
The article discusses Project RADAR, an intelligent fraud detection system developed by Uber that integrates machine learning and human expertise to identify and mitigate fraudulent activities in r...
Sergey Zelvenskiy, Garvit Harisinghani, Tiffany Yu, Edwin Ng, Robin Wei
14 min read
Has Summary
--
The article discusses the evolution of the Data Science Workbench (DSW) at Uber, highlighting its growth, challenges, and innovations over the past three years.
Peng Du, Taikun Liu, Sophie Wang, Hong Wang, Hongdi Li, Jin Sun
15 min read
Has Summary
--
The article discusses Uber's development of uWorc, a no-code workflow orchestrator designed to simplify the creation of batch and streaming data pipelines.
Horovod v0. 21 introduces significant enhancements aimed at optimizing network utilization for distributed deep learning training.
Kerri Brown
8 min read
Has Summary
--
The article discusses Uber's approach to monitoring data quality at scale using statistical modeling.
Ye Henry Li, Ritesh Agrawal, Santhosh Shanmugam, Andrea Pasqua
14 min read
Has Summary
--
The article discusses Uber's open source initiatives in 2019, highlighting the company's contributions to the open source community, the establishment of the Open Source Program Office (OSPO), and ...
Uber
7 min read
Has Summary
--
The article discusses the evolution of the Michelangelo model representation at Uber to enhance flexibility and scalability in machine learning model serving.
Anne Holler, Michael Mui
15 min read
Has Summary
--
The article discusses the latest updates to Horovod, a distributed deep learning framework, which now includes support for PySpark and Apache MXNet, along with features aimed at enhancing training ...
Carsten Jacobsen
7 min read
Has Summary
--
The article discusses the evolution and scaling of Uber's machine learning platform, Michelangelo, highlighting its development, deployment, and operational strategies.
Jeremy Hermann, Mike Del Balso
29 min read
Has Summary
--
The article introduces Michelangelo PyML, Uber's platform designed for rapid Python machine learning model development.
ApacheApache SparkDockergRPCJavaJSONMachine LearningPySparkPyTorchscikit-learnSQLTensorFlowThriftXGBoost
Kevin Stumpf, Stepan Bedratiuk, Olcay Cirit
15 min read
Has Summary
--
The article introduces Petastorm, an open-source data access library developed by Uber's Advanced Technologies Group (ATG) for facilitating deep learning model training and evaluation directly from...
Robbie Gruener, Owen Cheng, Yevgeni Litvin
16 min read
Includes Code
Has Summary
--
You've reached the end! All 18 articles loaded.