Pinterest Flink Deployment Framework

Pinterest Engineering

•

Pinterest Engineering

•6 min read•intermediate•

--

•View Original

ApacheAWSGitJenkinsJSONYAML

Overview

The article discusses Pinterest's Flink Deployment Framework, which is built on Bazel and integrates with various internal services to streamline the deployment of Flink jobs. It highlights the challenges faced in previous deployment processes and outlines the solutions implemented to enhance reliability and efficiency.

What You'll Learn

1

How to standardize Flink job builds using Bazel

2

Why job deduplication is crucial for Flink applications

3

How to create and configure Hermez deployment files for Flink jobs

Prerequisites & Requirements

Understanding of Apache Flink and stream processing concepts
Familiarity with Bazel and YAML configuration files(optional)

Key Questions Answered

What are the key components of Pinterest's Flink Deployment Framework?

The Flink Deployment Framework at Pinterest is built on Bazel, Hermez, Job Submission Service, and YARN clusters. It automates the building, deployment, and management of Flink jobs, ensuring reliability and scalability while providing features like job deduplication and configuration hotfixes.

How does the job deduplication feature work in the Flink Deployment Framework?

Job deduplication ensures only one instance of a Flink job runs at a time. If a job with the same name is submitted while another is running, the system triggers a savepoint for the existing job before stopping it, preventing double writes to Kafka and maintaining data integrity.

What steps are involved in launching a Flink job using Hermez?

To launch a Flink job, users create a Hermez YAML file that specifies YARN parameters and resources. Hermez converts this YAML into JSON and submits it to the Job Submission Service, which ensures that the required JARs and job state are available before executing the job on a YARN cluster.

What improvements are planned for the Flink Deployment Framework?

Future improvements include reducing deployment latency by optimizing the job launch process and implementing automatic job failover across multiple AWS Availability Zones to enhance application availability and reliability.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Stream Processing

Apache Flink

Used as the unified streaming processing engine for real-time data applications.

Build System

Bazel

Utilized for standardizing the build process of Flink jobs.

Resource Management

Yarn

Used for managing the execution of Flink jobs in clusters.

Deployment

Hermez

Internal continuous deployment platform for launching Flink jobs.

Key Actionable Insights

1
Implement a standardized build process for Flink jobs using Bazel to streamline deployments and reduce manual errors.
Standardizing the build process allows teams to focus on development rather than configuration, leading to faster iterations and more reliable deployments.

2
Utilize job deduplication features to prevent multiple instances of Flink jobs from running simultaneously, which can lead to data inconsistencies.
This practice is essential for maintaining data integrity, especially in systems that rely on Kafka for data streaming.

3
Leverage Hermez for managing deployment configurations, allowing for quick adjustments without needing to rebuild Flink job binaries.
This flexibility can be critical during incidents when immediate changes are necessary to resolve issues in production environments.

Common Pitfalls

1

Failing to implement job deduplication can lead to multiple instances of a Flink job running simultaneously, causing data integrity issues.

Without deduplication, users may accidentally deploy the same job multiple times, resulting in double writes to Kafka and affecting downstream processing.

Related Concepts

Stream Processing

Continuous Deployment

Job Orchestration

Data Integrity In Streaming Applications