Rethinking the Foundry job orchestration back end: From CRUD to event-sourcing

Robert Fink
14 min readadvanced
--
View Original

Overview

This article discusses the transition of Palantir Foundry's job orchestration system from a CRUD-based approach to an event-sourced architecture. It highlights the limitations of CRUD in handling complex job dependencies and illustrates the benefits of event sourcing and memory images in improving performance and consistency.

What You'll Learn

1

How to implement event sourcing in a job orchestration system

2

Why acyclicity and determinism are crucial in job orchestration

3

How to maintain performance while scaling job orchestration systems

Prerequisites & Requirements

  • Understanding of event sourcing and job orchestration concepts
  • Familiarity with distributed systems and concurrency control(optional)

Key Questions Answered

What are the limitations of CRUD in job orchestration systems?
CRUD-based orchestration services struggle with performance and consistency when managing large job graphs. As the number of datasets and transformations increases, the need for multiple database lookups and acyclicity checks can lead to significant slowdowns, making it difficult to maintain interactive performance.
How does event sourcing improve job orchestration?
Event sourcing allows for the storage of mutation events in a linear sequence, enabling faster reads through in-memory graphs. This approach improves performance by reducing the need to rehydrate the entire graph from a database, thus maintaining consistency and allowing for easier management of job dependencies.
What is the role of memory images in the event-sourced model?
Memory images serve as in-memory representations of the job specification graph, allowing for quick access and traversal. They are updated based on mutation events, which ensures that the orchestration service can efficiently check constraints and maintain performance without relying on slower database lookups.
What strategies can be used to handle race conditions in event-sourced systems?
To handle race conditions, optimistic concurrency control can be employed, where constraints are checked against a known version of the graph before emitting mutation events. This ensures that changes are only applied if the graph remains unchanged, thus avoiding conflicts and maintaining consistency.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Transitioning from a CRUD to an event-sourced architecture can significantly enhance the performance of job orchestration systems.
This is particularly beneficial when dealing with large datasets and complex job dependencies, as it reduces the overhead of database lookups and improves the speed of job execution.
2
Implementing memory images can provide a substantial performance boost by allowing for faster access to job graphs.
This technique is especially useful in environments where job specifications are frequently modified, as it minimizes the need to constantly read from slower persistent storage.
3
Ensuring acyclicity and determinism in job graphs is vital for the reliability of orchestration plans.
By maintaining these properties, developers can guarantee that there is a unique execution order for job transformations, simplifying the orchestration logic and enhancing user understanding.

Common Pitfalls

1
Failing to maintain acyclicity in job graphs can lead to unpredictable orchestration plans.
This issue arises when job dependencies are not properly managed, resulting in cycles that complicate execution order and can cause runtime errors.
2
Over-reliance on database lookups in CRUD implementations can severely impact performance.
As the number of datasets grows, the need for multiple lookups can create bottlenecks, making it essential to consider alternative architectures like event sourcing.

Related Concepts

Event Sourcing
Job Orchestration
Distributed Systems
Concurrency Control