How we built it: Real-time analytics for Stripe Billing

Among global business leaders surveyed, 84% agree that adapting pricing quickly will be a key competitive advantage. Our new real-time analytics system for Stripe Billing helps them spot customer trends just as they emerge.

Reed Trevelyan
8 min readintermediate
--
View Original

Overview

The article discusses the development of a real-time streaming analytics system for Stripe Billing, enabling customers to access subscription metrics with minimal latency. It highlights the transition from traditional batch processing to a more responsive architecture that supports real-time data updates and customizable metric definitions.

What You'll Learn

1

How to implement a real-time streaming analytics system using Apache Flink

2

Why low-latency analytics are crucial for subscription-based businesses

3

How to manage historical data while implementing real-time updates

Prerequisites & Requirements

  • Understanding of data processing architectures and subscription metrics
  • Familiarity with Apache Flink and Apache Pinot(optional)

Key Questions Answered

How does Stripe achieve real-time analytics for subscription metrics?
Stripe achieves real-time analytics by replacing traditional batch processing with an event-driven pipeline using Apache Flink. This allows subscription updates to be reflected in the Dashboard with latency as low as 15 minutes, enabling businesses to respond quickly to changing customer behaviors.
What challenges did Stripe face in implementing real-time analytics?
Stripe faced challenges in generating the initial state for long-standing customers and maintaining data consistency while allowing customizable metric definitions. They addressed these by using a combination of Apache Flink for streaming updates and a batch process for historical data alignment.
What improvements were made to the query engine for real-time analytics?
The query engine was upgraded to Apache Pinot's new v2 engine, which supports windowed aggregation queries. This allows for real-time analysis of subscription data without the need for preaggregation, significantly improving query responsiveness and flexibility.
How does Stripe handle changes to metric definitions in real-time?
When a customer changes a metric definition, Stripe initiates a batch process to align historical data while continuing to process new events using the old definition. This ensures data consistency and responsiveness in the Dashboard during updates.

Key Statistics & Figures

Latency for subscription updates
15 minutes
This is the time it takes for subscription updates to be reflected in the Stripe Dashboard.
Query latency in production
less than 300 milliseconds
This indicates the responsiveness of the Dashboard for user queries after the implementation of the new query engine.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implementing an event-driven architecture can drastically reduce data processing latency.
By transitioning from batch processing to an event-driven pipeline, businesses can achieve real-time analytics, which is essential for adapting to fast-changing market dynamics.
2
Utilizing tools like Apache Flink and Apache Pinot can enhance data aggregation capabilities.
These tools allow for efficient handling of large data sets and real-time querying, which can significantly improve user experience in data visualization applications.
3
Maintaining data consistency during metric definition changes is crucial for accurate reporting.
Implementing a workflow that balances historical recalculation with real-time updates ensures that users always have access to reliable data, which is vital for decision-making.

Common Pitfalls

1
Failing to account for historical data when implementing real-time updates can lead to inconsistencies.
Without a proper workflow to manage historical recalculation alongside real-time processing, businesses may present inaccurate metrics to users, undermining trust in the analytics.

Related Concepts

Real-time Data Processing
Event-driven Architecture
Subscription-based Business Models
Data Consistency In Analytics