RAPIDS Release 0.13 is Live and Packed with New Features

We have made huge progress on the work for release 0.13. Nearly all the code base has been ported to use the libcudf++ API. That means we are on a firm…

Nefi Alarcon
4 min readadvanced
--
View Original

Overview

RAPIDS Release 0.13 introduces significant updates across its core libraries, enhancing GPU-accelerated data science tools. Key features include a major refactor of cuDF, new functionalities in machine learning and graph analytics, and improvements in data visualization and SQL capabilities.

What You'll Learn

1

How to utilize the new groupby aggregations in cuDF for data analysis

2

Why the cuML library's multi-node, multi-GPU support is essential for scaling machine learning tasks

3

How to implement batch cubic spline interpolation using cuSpatial

4

When to use the new features in BlazingSQL for distributed queries

Key Questions Answered

What are the major updates in RAPIDS Release 0.13?
RAPIDS Release 0.13 includes a complete port to the libcudf++ API, new groupby aggregations in cuDF, multi-node support in cuML, and enhancements in cuGraph, cuXilter, cuSpatial, and cuSignal. These updates aim to improve performance and usability for GPU-accelerated data science.
How does the cuDF library improve performance in data manipulation?
The cuDF library introduces optimizations in the `concatenate` function that can yield up to 2000x speedups. Additionally, it expands groupby aggregations and adds new join methods, enhancing data manipulation capabilities.
What new features does BlazingSQL offer in this release?
BlazingSQL 0.13 adds new SQL functions like ROUND() and CASE with Strings, along with AVG() support for distributed queries. It also initiates a feature called 'Bigger than GPU' to handle SQL queries exceeding GPU memory.
What is the significance of the cuML library's multi-node, multi-GPU support?
The multi-node, multi-GPU support in cuML allows for the execution of linear models across multiple GPUs, significantly enhancing the scalability of machine learning tasks. This is crucial for handling larger datasets efficiently.

Key Statistics & Figures

Speedup from concatenate optimizations in cuDF
up to 2000x
This performance improvement allows for significantly faster data manipulation in large datasets.

Technologies & Tools

Data Manipulation
Cudf
Used for GPU-accelerated dataframes and data analysis.
Machine Learning
Cuml
Provides GPU-accelerated machine learning algorithms.
SQL Processing
Blazingsql
Enables SQL queries on large datasets with GPU acceleration.
Graph Analytics
Cugraph
Facilitates graph-based data analysis and algorithms.
Data Visualization
Cuxilter
Enhances data visualization capabilities with GPU acceleration.
Geospatial Analysis
Cuspatial
Provides tools for geospatial data processing.
Signal Processing
Cusignal
Offers GPU-accelerated signal processing functionalities.
Communication
Ucx-py
Facilitates communication in multi-node, multi-GPU setups.

Key Actionable Insights

1
Leverage the new groupby aggregations in cuDF to streamline data analysis processes.
These aggregations, including median and std, can significantly enhance your data manipulation capabilities, making it easier to derive insights from large datasets.
2
Utilize the multi-node, multi-GPU features in cuML to scale your machine learning models.
This capability is essential for processing larger datasets that require more computational power, ensuring that your models can be trained efficiently.
3
Explore the new SQL functions in BlazingSQL to enhance your data querying capabilities.
These functions can simplify complex queries and improve performance, particularly when working with distributed datasets.

Common Pitfalls

1
Failing to report bugs after the cuDF refactor could hinder future improvements.
As the cuDF library undergoes significant changes, users are encouraged to report any issues they encounter to help the development team address them promptly.

Related Concepts

Gpu-accelerated Data Science
Machine Learning With Cuml
Graph Analytics With Cugraph
Data Visualization Techniques