NVIDIA RAPIDS 25.08 Adds New Profiler for cuML, Updates to the Polars GPU Engine, Additional Algorithm Support,

The 25.08 release of RAPIDS continues to push the boundaries toward making accelerated data science more accessible and scalable with the addition of several…

Brian Tepera
8 min readadvanced
--
View Original

Overview

The NVIDIA RAPIDS 25.08 release introduces significant enhancements for accelerated data science, including new profiling tools for cuML, updates to the Polars GPU engine, and additional algorithm support. These features aim to improve performance, scalability, and ease of use for machine learning workflows.

What You'll Learn

1

How to use the new profiling tools in cuML to diagnose performance issues

2

Why the streaming executor in the Polars GPU engine enhances data processing capabilities

3

When to utilize the new algorithms in cuML for machine learning tasks

Key Questions Answered

What new features are introduced in the RAPIDS 25.08 release?
The RAPIDS 25.08 release introduces new profiling tools for cuML, updates to the Polars GPU engine for handling larger datasets, and new algorithm support including Spectral Embedding, LinearSVC, LinearSVR, and KernelRidge. Additionally, CUDA 11 support has been deprecated.
How do the new profiling tools in cuML improve machine learning workflows?
The new profiling tools in cuML allow users to identify which operations are executed on the GPU versus the CPU, helping to pinpoint performance bottlenecks. The function-level and line-level profilers provide detailed insights into execution times, enhancing debugging and optimization efforts.
What improvements does the Polars GPU engine provide for large datasets?
The Polars GPU engine now supports a streaming executor that allows processing of datasets larger than GPU memory by leveraging data partitioning. This significantly improves performance and scalability, especially for large workloads that exceed VRAM.
What algorithms have been added to cuML in the 25.08 release?
The 25.08 release of cuML includes new algorithms such as Spectral Embedding for dimensionality reduction, LinearSVC, LinearSVR, and KernelRidge, all of which can be used with zero code changes, enhancing the machine learning capabilities of the library.

Key Statistics & Figures

Performance improvement of streaming executor
nearly 5x faster
The streaming executor outperforms the in-memory engine when processing datasets larger than 300GB.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Data Science Framework
Nvidia Rapids
Used for accelerated data processing and machine learning.
Machine Learning Library
Cuml
Provides GPU-accelerated machine learning algorithms.
Data Processing Library
Polars
Used for efficient data manipulation and analysis on GPUs.

Key Actionable Insights

1
Leverage the new profiling tools in cuML to identify performance bottlenecks in your machine learning workflows.
By using the function-level and line-level profilers, you can gain insights into which operations are GPU-accelerated and which fall back to CPU, allowing for targeted optimizations.
2
Utilize the streaming executor in the Polars GPU engine to handle large datasets efficiently.
This feature allows you to process data that exceeds GPU memory, significantly improving performance for large-scale data processing tasks.
3
Explore the newly supported algorithms in cuML to enhance your machine learning models.
The addition of algorithms like Spectral Embedding and KernelRidge provides more options for model selection without requiring code changes, streamlining your workflow.

Common Pitfalls

1
Failing to utilize the profiling tools can lead to unnoticed performance issues in machine learning workflows.
Without profiling, developers may struggle to identify which parts of their code are not optimized for GPU execution, leading to inefficient resource usage.

Related Concepts

GPU Acceleration In Machine Learning
Data Processing With Polars
Profiling And Optimization Techniques