Dynein: Building an Open-source Distributed Delayed Job Queueing System

Learn about the background, challenges, and future of Airbnb’s distributed scheduling and queueing system.

Overview

The article discusses Dynein, an open-source distributed delayed job queueing system developed by Airbnb to enhance the scalability and reliability of job scheduling. It covers the challenges faced with previous systems, the design and implementation of Dynein, and its integration with AWS SQS and DynamoDB.

What You'll Learn

1

How to build a distributed delayed job queueing system using AWS services

2

Why to choose AWS SQS for job queuing in a microservices architecture

3

When to implement a custom job scheduler over existing solutions like Quartz

Prerequisites & Requirements

  • Understanding of job queueing systems and microservices architecture
  • Familiarity with AWS services like SQS and DynamoDB(optional)

Key Questions Answered

What are the key features of Dynein as a job scheduling system?
Dynein offers reliability with at-least-once job delivery, scalability to support future growth, job isolation for different applications, and timing accuracy with a p95 scheduling deviation lower than 10 seconds. It also supports efficient queuing with features like dead letter queues and individual message acknowledgment.
How does Dynein handle immediate and delayed jobs?
Immediate jobs are relayed to AWS SQS immediately, while delayed jobs are queued in an inbound SQS queue, allowing for effective scheduling and monitoring. The system ensures jobs are dispatched at the right time while maintaining system stability during spikes.
What challenges did Airbnb face with the Resque job queuing system?
Airbnb encountered issues with Resque's at-most-once delivery guarantee, significant scaling bottlenecks due to reliance on a single Redis instance, and difficulties in managing job isolation across applications. These limitations prompted the development of Dynein.
Why did Airbnb switch from Quartz to a DynamoDB-based scheduler?
Airbnb switched to a DynamoDB-based scheduler to improve scalability and reduce costs. The new scheduler efficiently handles job dispatching with a simpler query model, allowing for dynamic scaling and avoiding the complexities of managing multiple MySQL instances.

Key Statistics & Figures

p95 scheduling deviation
lower than 10 seconds
This measure ensures that jobs are executed within a tight timeframe, critical for applications requiring timely processing.
QPS achieved with DynamoDB-based scheduler
1,000 QPS
This performance level is achieved with significantly reduced costs compared to previous systems, demonstrating the efficiency of the new architecture.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
AWS Sqs
Used for managing job queues in Dynein, providing at-least-once delivery and high throughput.
Database
Dynamodb
Serves as the backend for the job scheduler, allowing for efficient querying and scaling.
Backend
Quartz
Initially used for job scheduling before transitioning to a custom solution.

Key Actionable Insights

1
Implementing a distributed job queueing system like Dynein can significantly enhance application scalability and reliability.
By offloading resource-intensive tasks to a background queue, applications can handle more requests concurrently, reducing performance bottlenecks.
2
Utilizing AWS SQS for job queuing simplifies the scaling process and reduces operational overhead.
SQS allows for easy provisioning of new queues, making it suitable for microservices architectures where each service can manage its own queue.
3
Consider building a custom job scheduler if existing solutions like Quartz do not meet your scalability needs.
A tailored scheduler can optimize performance and reduce costs, especially when dealing with high transaction volumes and dynamic workloads.

Common Pitfalls

1
Relying on a single instance for job queuing can lead to significant scaling limitations.
This often results in performance bottlenecks and can hinder the ability to manage increased workloads effectively.
2
Using a complex job scheduler like Quartz without understanding its configuration can lead to operational overhead.
Many teams struggle with the extensive API surface and configuration options, which can complicate job processing and lead to inefficiencies.

Related Concepts

Microservices Architecture
Job Scheduling Best Practices
Distributed Systems Design