How Uber Uses PySpark

18 engineering articles about PySpark from Uber's engineering team

Other Uber Technologies

Apache(195)Java(112)Apache Spark(94)MySQL(78)SQL(77)JSON(74)

Other Companies Using PySpark

Articles

Filter:

Uber

Advanced

Uber’s Strategy to Upgrading 2M+ Spark Jobs

Uber's migration from Spark 2. 4 to Spark 3. 3 involved upgrading over 2 million Spark applications, utilizing innovative automation tools like Iron Dome.

ApacheApache SparkJavaKubernetesMySQLOraclePySparkPythonScalaSQL

Amruth Sampath, Arnav Balyan, Nimesh Khandelwal, Sumit Singh, Parth Halani, Suprit Acharya

8 min read

Has Summary

Uber

Intermediate

Migrating Large-Scale Interactive Compute Workloads to Kubernetes Without Disruption

The article discusses Uber's migration of large-scale interactive compute workloads from Peloton to Kubernetes, focusing on minimizing disruption while enhancing resource management and cloud readi...

ApacheApache SparkCassandraDockerGoogle CloudKubernetesPySpark

Sayan Pal, Rishabh Mishra

12 min read

Has Summary

Uber

Advanced

How Uber Uses Ray® to Optimize the Rides Business

The article discusses how Uber utilizes Ray®, a general compute engine for Python®, to enhance the efficiency of its rides business through improved machine learning model performance and optimizat...

ApacheApache SparkAWSDockerKubernetesPandasPySparkXGBoost

Kaichen Wei, Matt Walker, Peng Zhang

15 min read

Has Summary

Uber

Intermediate

Genie: Uber’s Gen AI On-Call Copilot

The article discusses Genie, Uber's generative AI on-call copilot designed to enhance communication and efficiency in on-call operations.

ApacheApache SparkCopilotEmbeddingFine-tuningPySpark

Paarth Chothani, Eduards Sidorovics, Xiyuan Feng, Nicholas Marcott, Jonathan Li, Chun Zhu, Kailiang Fu, Meghana Somasundara

11 min read

Has Summary

Uber

Intermediate

Pinot for Low-Latency Offline Table Analytics

The article discusses how Uber utilizes Apache Pinot for low-latency offline table analytics, highlighting its capabilities in handling various use cases, including real-time and offline data inges...

ApacheApache KafkaApache SparkgRPCJavaMySQLOraclePySparkScala

Ankit Sultana, Caner Balci

15 min read

Has Summary

Uber

Advanced

Innovative Recommendation Applications Using Two Tower Embeddings at Uber

The article discusses the implementation of Two-Tower Embeddings (TTE) at Uber, highlighting its role in enhancing the efficiency and scalability of recommendation systems.

ApacheApache SparkArtificial IntelligenceDeep LearningEmbeddingMachine LearningPySpark

Bo Ling, Melissa Barr, Dhruva Dixith Kurra, Chun Zhu, Nicholas Marcott

18 min read

Has Summary

Uber

Advanced

Enabling Offline Inferences at Uber Scale

The article discusses Uber's approach to automating offline inferences using machine learning and natural language processing on support interaction data.

ApacheApache SparkDockerPySparkStreamlitXGBoost

Neeraj Dhake, Aravind Ranganathan

12 min read

Has Summary

Uber

Advanced

Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop

The article discusses Project RADAR, an intelligent fraud detection system developed by Uber that integrates machine learning and human expertise to identify and mitigate fraudulent activities in r...

ApacheApache SparkPySparkScalaSQL

Sergey Zelvenskiy, Garvit Harisinghani, Tiffany Yu, Edwin Ng, Robin Wei

14 min read

Has Summary

Uber

Intermediate

The Evolution of Data Science Workbench

The article discusses the evolution of the Data Science Workbench (DSW) at Uber, highlighting its growth, challenges, and innovations over the past three years.

ApacheApache SparkMySQLPySpark

Peng Du, Taikun Liu, Sophie Wang, Hong Wang, Hongdi Li, Jin Sun

15 min read

Has Summary

Uber

Intermediate

No Code Workflow Orchestrator for Building Batch & Streaming Pipelines at Scale

The article discusses Uber's development of uWorc, a no-code workflow orchestrator designed to simplify the creation of batch and streaming data pipelines.

ApacheAWSAzureCassandraJSONMySQLPySparkSQL

Sandeep Karmakar, Sriharsha Chintalapani

11 min read

Has Summary

Uber

Intermediate

Horovod v0.21: Optimizing Network Utilization with Local Gradient Aggregation and Grouped Allreduce

Horovod v0. 21 introduces significant enhancements aimed at optimizing network utilization for distributed deep learning training.

ApacheApache SparkAWSAzureDeep LearningKerasMachine LearningPySparkPyTorchTensorFlow

Kerri Brown

8 min read

Has Summary

Uber

Intermediate

Monitoring Data Quality at Scale with Statistical Modeling

The article discusses Uber's approach to monitoring data quality at scale using statistical modeling.

PySpark

Ye Henry Li, Ritesh Agrawal, Santhosh Shanmugam, Andrea Pasqua

14 min read

Has Summary

Uber

Intermediate

Uber Open Source in 2019: Community Engagement and Contributions

The article discusses Uber's open source initiatives in 2019, highlighting the company's contributions to the open source community, the establishment of the Open Source Program Office (OSPO), and ...

ApacheApache SparkDeep LearningDockerGitKerasKubernetesNode.jsPySparkPyTorchSQLTensorFlow

Uber

7 min read

Has Summary

Uber

Advanced

Evolving Michelangelo Model Representation for Flexibility at Scale

The article discusses the evolution of the Michelangelo model representation at Uber to enhance flexibility and scalability in machine learning model serving.

ApacheApache SparkDockerJavaMachine LearningPySparkSQLTensorFlowTransformerTransformers

Anne Holler, Michael Mui

15 min read

Has Summary

Uber

Intermediate

Horovod Adds Support for PySpark and Apache MXNet and Additional Features for Faster Training

The article discusses the latest updates to Horovod, a distributed deep learning framework, which now includes support for PySpark and Apache MXNet, along with features aimed at enhancing training ...

ApacheApache SparkAWSAzureBERTDeep LearningEmbeddingKerasPySparkPyTorchSQLTensorFlowTransformer

Carsten Jacobsen

7 min read

Has Summary

Uber

Advanced

Scaling Machine Learning at Uber with Michelangelo

The article discusses the evolution and scaling of Uber's machine learning platform, Michelangelo, highlighting its development, deployment, and operational strategies.

ApacheApache SparkCassandraJavaMachine LearningPySparkScalaTensorFlowThriftTransformer

Jeremy Hermann, Mike Del Balso

29 min read

Has Summary

Uber

Advanced

Michelangelo PyML: Introducing Uber’s Platform for Rapid Python ML Model Development

The article introduces Michelangelo PyML, Uber's platform designed for rapid Python machine learning model development.

ApacheApache SparkDockergRPCJavaJSONMachine LearningPySparkPyTorchscikit-learnSQLTensorFlowThriftXGBoost

Kevin Stumpf, Stepan Bedratiuk, Olcay Cirit

15 min read

Has Summary

Uber

Advanced

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

The article introduces Petastorm, an open-source data access library developed by Uber's Advanced Technologies Group (ATG) for facilitating deep learning model training and evaluation directly from...

ApacheApache ArrowApache SparkDeep LearningNumPyPySparkPyTorchSQL

Robbie Gruener, Owen Cheng, Yevgeni Litvin

16 min read

Includes Code

Has Summary

You've reached the end! All 18 articles loaded.