Overview
The article discusses how Pinterest utilizes the Kafka Streams API to create a predictive budgeting system that minimizes ad overdelivery. It highlights the challenges of real-time spend data and predictive spend, and outlines the architecture and implementation of the solution.
What You'll Learn
1
How to implement a predictive budgeting system using Kafka Streams API
2
Why real-time spend data is critical for ad delivery systems
3
How to optimize window store design for better performance in streaming applications
Prerequisites & Requirements
- Understanding of streaming data processing concepts
- Familiarity with Kafka Streams API
Key Questions Answered
What is overdelivery in advertising and how does it occur?
Overdelivery happens when ads are shown to out-of-budget advertisers, leading to lost opportunities for others. It occurs due to slow reaction times of the spend system, which fails to account for impressions in real-time, resulting in excess ad impressions beyond the budget.
How does the predictive budgeting system reduce overdelivery?
The predictive budgeting system calculates inflight spend, which includes costs of ad insertions that haven't been charged yet. By ensuring that the sum of actual spend and inflight spend does not exceed the daily budget, the system prevents overdelivery.
What are the main requirements for building a spend prediction system?
The spend prediction system must handle different ad types, process tens of thousands of events per second, fan out updates to over 1,000 consumer machines, maintain an end-to-end delay of less than 10 seconds, ensure 100% uptime, and be lightweight for maintainability.
Why was Kafka Streams chosen over other streaming services?
Kafka Streams was selected due to its millisecond delay guarantee, lightweight nature, and lack of heavy external dependencies, which minimizes maintenance costs compared to alternatives like Spark and Flink.
Key Statistics & Figures
Overdelivery rate
1 percent
This was calculated based on the example where 10 out of 1,000 impressions were overdelivered.
Performance improvement
18x
This improvement was achieved by switching from hopping windows to tumbling windows in the window store design.
End-to-end delay
less than 10 seconds
This is a requirement for the spend prediction system to ensure timely ad delivery.
Technologies & Tools
Stream Processing
Kafka Streams
Used to build the predictive budgeting system for real-time ad spend tracking.
Key Actionable Insights
1Implementing a predictive budgeting system can significantly enhance ad delivery efficiency.By accurately predicting spend, companies can optimize ad placements and ensure that budgets are respected, leading to better advertiser satisfaction and revenue.
2Optimizing window store design using tumbling windows can drastically improve performance.Switching from hopping to tumbling windows reduced redundant computations, resulting in an 18x performance improvement, which is crucial for systems handling high event volumes.
3Utilizing delta encoding and lookup table encoding can effectively reduce message sizes in streaming applications.This compression strategy helps manage high fanout effects on consumers, making the system more efficient and responsive.
Common Pitfalls
1
Using hopping windows instead of tumbling windows can lead to performance issues.
Hopping windows create multiple overlapping windows, causing redundant computations and slowing down the system. Transitioning to tumbling windows eliminates this overlap and improves throughput.
Related Concepts
Streaming Data Processing
Predictive Analytics In Advertising
Real-time Data Systems