Overview
This article discusses the implementation of Ray Batch Inference at Pinterest, highlighting its advantages over previous solutions like Apache Spark and Torch Dataloader. Key improvements include a 4.5x increase in throughput and a 30x reduction in costs, showcasing the efficiency of Ray in handling offline batch inference for machine learning models.
What You'll Learn
1
How to implement batch inference using Ray Data
2
Why Ray is preferred for offline batch inference over Apache Spark
3
How to utilize carryover columns in batch inference pipelines
4
When to apply multi-model inference in machine learning workflows
Prerequisites & Requirements
- Understanding of machine learning concepts and batch processing
- Familiarity with Ray and its data processing libraries(optional)
Key Questions Answered
What are the benefits of using Ray for batch inference?
Ray provides significant benefits for batch inference, including streaming execution that allows for parallel processing of data loading, inference, and result writing. This leads to a 4.5x increase in throughput and a 30x reduction in costs compared to previous solutions like Apache Spark.
How does Ray handle carryover columns in batch inference?
Ray Batch Inference allows for the inclusion of carryover columns, which are necessary for downstream jobs but not processed during inference. This is efficiently managed using zero-copy operations with pyarrow tables, minimizing data overhead.
What improvements were observed after migrating from Spark to Ray?
The search quality team at Pinterest experienced a 30x decrease in annual costs after migrating from Spark to Ray. This was due to better GPU utilization and the ability to combine multiple inference jobs into a single Ray job, reducing data reading costs.
What is the role of accumulators in Ray Batch Inference?
Accumulators in Ray Batch Inference are used to efficiently calculate evaluation metrics like AUC-ROC and cross-entropy loss. They merge values across different inference runners, improving performance and reducing memory overhead.
Key Statistics & Figures
Throughput improvement
4.5x
Achieved by migrating from Torch Dataloader to Ray Batch Inference.
Cost reduction
30x
Realized by the search quality team after transitioning from Spark to Ray.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Backend
Ray
Used for batch inference and data processing in machine learning workflows.
Backend
Apache Spark
Previous solution for batch inference that Ray replaced.
Backend
Torch Dataloader
Another previous solution for data loading in ML that was improved upon with Ray.
Backend
Vllm
Inference optimization engine for Large Language Models used in conjunction with Ray.
Key Actionable Insights
1Implement Ray Batch Inference to improve the efficiency of your ML workflows.By leveraging Ray's capabilities, you can achieve significant throughput improvements and cost savings, as demonstrated by Pinterest's transition from Spark.
2Utilize carryover columns to maintain essential data in your inference outputs.This practice ensures that important metadata is preserved for downstream processing without incurring additional computational costs.
3Consider multi-model inference to streamline your evaluation processes.Running multiple models in a single job can reduce overhead and improve resource utilization, making it a valuable strategy for complex ML applications.
Common Pitfalls
1
Failing to optimize data loading can bottleneck your inference jobs.
Many teams overlook the importance of efficient data loading, which can severely limit throughput. By utilizing Ray's capabilities for parallel data loading, you can avoid this common issue.
2
Neglecting carryover columns may lead to loss of important metadata.
If carryover columns are not properly managed, essential information for downstream processes can be lost, impacting the overall effectiveness of your ML workflows.
Related Concepts
Batch Processing Techniques
Ray Data Library
Machine Learning Model Evaluation
Large Language Models (llms)