Overview
The article discusses Uber's development of uWorc, a no-code workflow orchestrator designed to simplify the creation of batch and streaming data pipelines. It highlights the challenges faced with existing frameworks and the need for a more intuitive user experience to empower data analysts and operations users.
What You'll Learn
1
How to create data workflows using a no-code interface
2
Why a unified workflow orchestrator is essential for real-time data insights
3
How to leverage prebuilt tasks for efficient data processing
Prerequisites & Requirements
- Basic understanding of data workflows and SQL(optional)
Key Questions Answered
What challenges did Uber face with existing data pipeline frameworks?
Uber experienced a productivity tax on pipeline creators due to a reliance on a small number of data engineers, leading to delays in creating data workflows. The demand for real-time insights further complicated matters, as existing frameworks were not designed for rapid development.
How does uWorc simplify the workflow creation process?
uWorc simplifies workflow creation by providing a drag-and-drop interface that allows users to build data pipelines without writing code. This approach reduces the time needed to create workflows from hours or days to just minutes, making it accessible for users without programming expertise.
What are the guiding principles behind uWorc's design?
The guiding principles for uWorc's design are to simplify the workflow authoring process, unify the experience across batch and real-time workflows, and consolidate multiple tools into a single platform. These principles aim to enhance user experience and efficiency.
What types of tasks can be performed with uWorc?
uWorc supports a variety of tasks including Hive, Spark, PySpark, and bash. It also allows users to run Jupyter notebooks, enabling data scientists to deploy their notebooks with pre- and post-processing steps integrated into their workflows.
Key Statistics & Figures
Number of data pipelines at Uber
15,000
Uber's data platform supports over 15,000 data pipelines, highlighting the scale of data movement within the organization.
Percentage of workflows falling into two categories
40%
Currently, almost 40% of workflows at Uber are focused on data analysis and operations, indicating the primary use cases for uWorc.
Time reduction for workflow deployment
Less than five minutes
Workflows that previously took hours to complete and deploy can now be done in under five minutes using uWorc.
Current number of workflows in uWorc
10,000
Since its launch, uWorc has facilitated the creation of over 10,000 workflows across various use cases.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Workflow Management
Apache Airflow
Used as a background technology for executing workflows in uWorc.
Stream Processing
Apache Flink
Utilized for real-time data processing within uWorc.
Data Science
Jupyter Notebooks
Allows data scientists to deploy their notebooks with integrated processing steps.
Key Actionable Insights
1Implementing a no-code workflow orchestrator like uWorc can significantly reduce the time required to create data pipelines.This is particularly beneficial for organizations with a diverse user base that includes non-technical users, allowing them to focus on data analysis rather than coding.
2Utilizing prebuilt tasks within uWorc can streamline the data processing workflow.By leveraging these tasks, users can quickly set up complex workflows without needing extensive programming knowledge, thus increasing productivity.
3Adopting a unified approach to batch and real-time workflows can enhance decision-making capabilities.This is crucial for businesses that rely on timely insights to drive operations and strategy, making it easier to respond to market changes.
Common Pitfalls
1
Over-reliance on technical teams can lead to bottlenecks in workflow creation.
This often occurs when non-technical users depend on data engineers for pipeline creation, causing delays and hindering productivity.
2
Complexity in existing frameworks can deter users from effectively utilizing data tools.
Many users may find programming interfaces overwhelming, leading to underutilization of powerful data processing capabilities.
Related Concepts
Data Pipeline Management
No-code Development Platforms
Real-time Data Processing