Shepherd: How Stripe adapted Chronon to scale ML feature development

This blog discusses the technical details of how we built Shepherd and how we are expanding the capabilities of Chronon to meet Stripe’s scale.

Ben Mears
11 min readintermediate
--
View Original

Overview

The article discusses how Stripe adapted Airbnb's Chronon platform to create Shepherd, a next-generation ML feature engineering platform that enhances the development and deployment of ML models at scale. It highlights the challenges of feature engineering in a high-volume environment and details the technical adaptations made to Chronon to meet Stripe's specific requirements.

What You'll Learn

1

How to adapt an existing ML feature engineering platform for large-scale applications

2

Why maintaining low latency and feature freshness is crucial in ML model deployment

3

How to implement a dual KV store for cost-efficient data management

4

When to use streaming platforms like Flink for low latency feature updates

Prerequisites & Requirements

  • Understanding of ML feature engineering concepts
  • Familiarity with Python and SQL
  • Experience with data processing frameworks like Spark(optional)

Key Questions Answered

How did Stripe adapt Chronon for its ML feature engineering needs?
Stripe adapted Chronon by ensuring it could handle offline, online, and streaming components at scale, modifying its KV store for cost-efficient data management, and implementing support for Spark SQL expressions in Flink. This adaptation allowed Stripe to meet its strict latency and feature freshness requirements.
What are the latency and feature freshness requirements for ML models at Stripe?
Latency refers to the time required to retrieve features during model inference, which impacts payment processing speed. Feature freshness measures how quickly feature values are updated to reflect changes in data, crucial for adapting to evolving fraud patterns. Stripe aims for low latency and high feature freshness across billions of transactions.
What was the outcome of using Shepherd for fraud detection at Stripe?
The Shepherd-enabled fraud detection model, which includes over 200 features, has outperformed previous models, blocking tens of millions of dollars in additional fraud annually. This demonstrates the effectiveness of the new feature engineering platform in real-world applications.
What challenges did Stripe face in feature engineering at scale?
Stripe faced challenges in identifying and deploying new features from hundreds of terabytes of raw data, requiring a platform that could efficiently handle the lifecycle of feature development while meeting strict latency and freshness requirements.

Key Statistics & Figures

p99 feature freshness
150ms
Achieved with Flink powering feature updates, ensuring timely data for ML model predictions.
Number of features in the new SEPA fraud model
over 200
This model was developed using Shepherd and has significantly improved fraud detection capabilities.
Additional fraud blocked annually
tens of millions of dollars
The new Shepherd-enabled model has outperformed previous models in fraud detection.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

ML Feature Engineering Platform
Chronon
Serves as the foundation for Shepherd, enabling efficient feature development.
Streaming Platform
Flink
Used for low latency stateful processing of feature updates.
Data Processing Framework
Spark SQL
Used for defining features and maintaining consistency between online and offline computations.
Data Orchestration
Airflow
Customized for scheduling and running offline jobs in integration with Chronon.

Key Actionable Insights

1
Implement a dual KV store to optimize data management for ML features.
By splitting the KV store into a lower-cost bulk upload store and a higher-cost distributed store, you can balance cost and performance, ensuring that your ML models can access data quickly without incurring excessive storage costs.
2
Utilize Flink for low latency streaming updates in your ML pipelines.
Flink's stateful processing capabilities allow for efficient handling of feature updates, which is essential for applications that require real-time data processing, such as fraud detection.
3
Regularly benchmark your feature engineering platform to ensure scalability.
As your datasets grow, it's crucial to verify that your algorithms can handle increased loads without performance degradation. This proactive approach helps maintain the efficiency of your ML operations.

Common Pitfalls

1
Failing to balance latency and feature freshness can lead to poor model performance.
If a system prioritizes low latency over freshness, it may not react quickly enough to changing data patterns, leading to outdated predictions. Conversely, focusing solely on freshness can slow down overall processing times.

Related Concepts

ML Feature Engineering
Data Processing Frameworks
Real-time Data Processing
Fraud Detection Models