Open-sourcing Pinball

Pinterest Engineering
7 min readintermediate
--
View Original

Overview

The article discusses the open-sourcing of Pinball, a customizable workflow manager developed by Pinterest to meet their data processing needs. It highlights the challenges faced with existing solutions and details the architecture, features, and capabilities of Pinball in managing complex workflows efficiently.

What You'll Learn

1

How to build a customizable workflow manager for data processing

2

Why existing workflow managers may not meet evolving data processing needs

3

How to define and deploy workflows using Pinball

Prerequisites & Requirements

  • Understanding of data processing workflows and job dependencies
  • Familiarity with command line tools for workflow deployment(optional)

Key Questions Answered

What is Pinball and how does it function as a workflow manager?
Pinball is a customizable workflow manager developed by Pinterest to handle complex data processing tasks. It supports workflows composed of various jobs, from simple shell scripts to complex Hadoop workloads, and is designed to adapt to the evolving needs of data processing solutions.
How does Pinball manage workflow execution and job dependencies?
Pinball uses a master-worker architecture where a central master maintains the current system state, while stateless clients handle job execution and scheduling. Workflows are defined through configuration files or a UI builder, allowing for flexible job management and execution.
What are the key features of Pinball's workflow management system?
Key features of Pinball include its customizable architecture, support for various job types, a pluggable parser for workflow definitions, and the ability to handle complex workflows with thousands of jobs. It also includes features for job retries, cleanup commands, and email notifications upon workflow completion.
What challenges did Pinterest face with existing workflow managers?
Pinterest found that existing open-source workflow managers were either too specialized for specific job types or overly broad and difficult to extend. This led them to develop Pinball, which offers the flexibility needed to adapt to their diverse data processing requirements.

Key Statistics & Figures

Data processed daily
almost three petabytes
Pinball handles this volume of data across various workflows managed by Pinterest's engineering teams.
Largest workflow jobs
more than 500 jobs
This illustrates the complexity and scale at which Pinball operates, accommodating extensive data processing needs.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Hadoop
Used for processing large datasets within workflows managed by Pinball.
Backend
Hive
Utilized in conjunction with Pinball for data processing tasks.
Backend
Spark
Employed for executing complex data processing jobs within Pinball workflows.

Key Actionable Insights

1
Consider developing a custom workflow manager like Pinball if existing solutions do not meet your needs.
If your organization faces challenges with existing workflow tools, building a tailored solution can provide the flexibility and adaptability required for evolving data processing tasks.
2
Utilize Pinball's pluggable parser to define workflows in a format that suits your team's needs.
This feature allows teams to express workflows in a way that aligns with their existing processes, improving usability and efficiency.
3
Leverage the ability to run multiple instances of the same workflow in parallel to optimize resource usage.
This capability can significantly enhance throughput and efficiency in data processing tasks, especially when dealing with large datasets.

Common Pitfalls

1
Assuming existing workflow managers will meet all data processing needs without customization.
Many organizations find that off-the-shelf solutions lack the flexibility required for their specific use cases, leading to inefficiencies and frustration.

Related Concepts

Data Processing Workflows
Job Dependencies In Workflow Management
Custom Workflow Solutions