RAPIDS Adds GPU Polars Streaming, a Unified GNN API, and Zero-Code ML Speedups

RAPIDS, a suite of NVIDIA CUDA-X libraries for Python data science, released version 25.06, introducing exciting new features. These include a Polars GPU…

Brian Tepera
6 min readintermediate
--
View Original

Overview

RAPIDS version 25.06 introduces significant enhancements including a Polars GPU streaming engine, a unified API for graph neural networks (GNNs), and zero-code-change acceleration for support vector machines. These updates are designed to improve data processing workflows and machine learning capabilities on NVIDIA GPUs.

What You'll Learn

1

How to leverage the Polars GPU streaming engine for large datasets

2

Why the Unified API simplifies GNN workflows across different GPU setups

3

How to implement zero-code-change support vector machines in existing workflows

4

When to use the RAPIDS Memory Manager for improved performance on NVIDIA Blackwell GPUs

Prerequisites & Requirements

  • Understanding of GPU programming and data science concepts
  • Familiarity with NVIDIA RAPIDS libraries and Python(optional)

Key Questions Answered

What new features does RAPIDS version 25.06 introduce?
RAPIDS version 25.06 introduces a Polars GPU streaming engine, a unified API for graph neural networks, and zero-code-change acceleration for support vector machines. These features enhance data processing capabilities and simplify machine learning workflows on NVIDIA GPUs.
How can the Polars GPU streaming engine handle large datasets?
The Polars GPU streaming engine utilizes a streaming executor that allows for the processing of datasets larger than the available VRAM by leveraging data partitioning and parallel processing. This enables efficient analytics operations on data ranging from hundreds of GBs to TBs.
What improvements were made to cuML for zero-code-change functionality?
The cuML library has expanded its zero-code-change functionality to include support vector machines, allowing existing scikit-learn workflows to be accelerated without any modifications. This enhances the performance of algorithms like Support Vector Classification and Support Vector Regression on GPUs.
What is the significance of the Unified API for GNNs?
The Unified API for GNNs allows users to run the same training scripts across single-GPU, multi-GPU, and multi-node setups without modification. This streamlines the development process and enhances usability for PyTorch users.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Data Science Libraries
Nvidia Rapids
Used for GPU-accelerated data processing and machine learning.
Data Manipulation Library
Polars
Provides a GPU streaming engine for efficient data processing.
Machine Learning Library
Cuml
Offers zero-code-change acceleration for machine learning algorithms.
Distributed Computing
Dask
Facilitates multi-GPU execution and data processing workflows.

Key Actionable Insights

1
Utilize the Polars GPU streaming engine to manage large datasets effectively. This allows for efficient data processing workflows that can scale across multiple GPUs, significantly speeding up analytics operations.
This is particularly useful for data scientists working with large time series datasets or complex analytics tasks that exceed VRAM limitations.
2
Adopt the Unified API for GNNs to simplify your machine learning workflows. This API enables seamless transitions between different GPU configurations without changing your codebase.
This is beneficial for teams that prototype on single GPUs and later scale to multi-GPU or multi-node environments, saving time and reducing complexity.
3
Leverage zero-code-change enhancements in cuML to accelerate your existing machine learning models. This can lead to significant performance improvements without the need for code refactoring.
This is especially advantageous for organizations already using scikit-learn, as they can enhance their workflows with minimal effort.

Common Pitfalls

1
Assuming that all operations are supported by the new streaming executor can lead to unexpected fallbacks to the in-memory executor.
Users should be aware that the streaming executor is still under development, and unsupported operations will revert to the traditional method, which may not be optimal for large datasets.

Related Concepts

GPU Programming
Data Science Workflows
Machine Learning Acceleration