Ray Batch Inference at Pinterest (Part 3)

Pinterest Engineering

•

Pinterest Engineering

•11 min read•advanced•

--

•View Original

ApacheApache SparkAWSHugging FaceLarge Language ModelsLLaMAPyTorchRay TuneTensorFlow

Overview

This article discusses the implementation of Ray Batch Inference at Pinterest, highlighting its advantages over previous solutions like Apache Spark and Torch Dataloader. Key improvements include a 4.5x increase in throughput and a 30x reduction in costs, showcasing the efficiency of Ray in handling offline batch inference for machine learning models.

What You'll Learn

1

How to implement batch inference using Ray Data

2

Why Ray is preferred for offline batch inference over Apache Spark

3

How to utilize carryover columns in batch inference pipelines

4

When to apply multi-model inference in machine learning workflows

Prerequisites & Requirements

Understanding of machine learning concepts and batch processing
Familiarity with Ray and its data processing libraries(optional)

Key Questions Answered

What are the benefits of using Ray for batch inference?

Ray provides significant benefits for batch inference, including streaming execution that allows for parallel processing of data loading, inference, and result writing. This leads to a 4.5x increase in throughput and a 30x reduction in costs compared to previous solutions like Apache Spark.

How does Ray handle carryover columns in batch inference?

Ray Batch Inference allows for the inclusion of carryover columns, which are necessary for downstream jobs but not processed during inference. This is efficiently managed using zero-copy operations with pyarrow tables, minimizing data overhead.

What improvements were observed after migrating from Spark to Ray?

The search quality team at Pinterest experienced a 30x decrease in annual costs after migrating from Spark to Ray. This was due to better GPU utilization and the ability to combine multiple inference jobs into a single Ray job, reducing data reading costs.

What is the role of accumulators in Ray Batch Inference?

Accumulators in Ray Batch Inference are used to efficiently calculate evaluation metrics like AUC-ROC and cross-entropy loss. They merge values across different inference runners, improving performance and reducing memory overhead.

Key Statistics & Figures

Throughput improvement

4.5x

Achieved by migrating from Torch Dataloader to Ray Batch Inference.

Cost reduction

30x

Realized by the search quality team after transitioning from Spark to Ray.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Ray

Used for batch inference and data processing in machine learning workflows.

Backend

Apache Spark

Previous solution for batch inference that Ray replaced.

Backend

Torch Dataloader

Another previous solution for data loading in ML that was improved upon with Ray.

Backend

Vllm

Inference optimization engine for Large Language Models used in conjunction with Ray.

Key Actionable Insights

1
Implement Ray Batch Inference to improve the efficiency of your ML workflows.
By leveraging Ray's capabilities, you can achieve significant throughput improvements and cost savings, as demonstrated by Pinterest's transition from Spark.

2
Utilize carryover columns to maintain essential data in your inference outputs.
This practice ensures that important metadata is preserved for downstream processing without incurring additional computational costs.

3
Consider multi-model inference to streamline your evaluation processes.
Running multiple models in a single job can reduce overhead and improve resource utilization, making it a valuable strategy for complex ML applications.

Common Pitfalls

1

Failing to optimize data loading can bottleneck your inference jobs.

Many teams overlook the importance of efficient data loading, which can severely limit throughput. By utilizing Ray's capabilities for parallel data loading, you can avoid this common issue.

2

Neglecting carryover columns may lead to loss of important metadata.

If carryover columns are not properly managed, essential information for downstream processes can be lost, impacting the overall effectiveness of your ML workflows.

Related Concepts

Batch Processing Techniques

Ray Data Library

Machine Learning Model Evaluation

Large Language Models (llms)