Overview
The article discusses how Uber has implemented the Uber Spark Compute Service (uSCS) to simplify the use of Apache Spark across its extensive infrastructure. It highlights the challenges faced with Spark at scale and how uSCS addresses these issues, enhancing user experience and operational efficiency.
What You'll Learn
1
How to manage Spark applications at scale using uSCS
2
Why observability is crucial for Spark application performance
3
How to leverage Apache Livy for Spark application submission
Prerequisites & Requirements
- Understanding of Apache Spark and its architecture
- Familiarity with Apache Livy(optional)
Key Questions Answered
What challenges does Uber face when using Apache Spark at scale?
Uber encounters several challenges with Apache Spark, including data source diversity, multiple compute clusters, and dependency issues. These complexities make it difficult for users to maintain reliable access to data and compute resources, leading to potential outages if configurations are not updated.
How does uSCS improve the Spark development workflow at Uber?
uSCS simplifies the Spark development workflow by acting as a central coordinator for Spark applications. It manages environment settings, allows users to submit applications without needing to worry about cluster configurations, and automates the scheduling process, thereby reducing maintenance overhead.
What are the advantages of using uSCS for Spark applications?
The uSCS architecture offers advantages such as service configuration abstraction, enhanced observability of application performance, and automated migration processes. This leads to improved resource utilization and a more standardized experience for users across Uber's Spark applications.
Key Statistics & Figures
Number of Spark applications run daily
More than one hundred thousand
This scale highlights the extensive use of Spark in Uber's operations.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Backend
Apache Spark
Core technology for processing large-scale data at Uber.
Backend
Apache Livy
Used to manage Spark application submissions and configurations.
Key Actionable Insights
1Utilize uSCS to streamline the management of Spark applications across different environments.By leveraging uSCS, users can reduce the complexity of managing Spark applications, allowing them to focus on development rather than configuration management.
2Implement observability tools to track application performance and failures.This enables teams to quickly identify and address issues, improving overall application reliability and user satisfaction.
Common Pitfalls
1
Failing to keep Spark configurations updated can lead to application failures.
As data sources and compute environments evolve, users must ensure their configurations are current to avoid unexpected outages.
Related Concepts
Apache Spark Architecture
Data Source Management
Application Observability
Resource Management In Distributed Systems