New Asynchronous Programming Model Library Now Available with NVIDIA HPC SDK v22.11

Find out what’s in the NVIDIA HPC SDK v22.11 release, including a preview of an innovative library for standardizing and asynchronously scheduling C++ work.

Jay Gould
3 min readintermediate
--
View Original

Overview

NVIDIA has released the HPC Software Development Kit (SDK) v22.11, featuring the new stdexec library for asynchronous programming in C++. This update enhances developer productivity and application portability while introducing multi-node functionality in libraries like cuSOLVER and cuFFT, significantly improving performance in high-performance computing applications.

What You'll Learn

1

How to use the stdexec library for asynchronous programming in C++

2

Why the stdexec library improves resource utilization and performance

3

How to scale applications using multi-node functionality in cuSOLVER and cuFFT

Prerequisites & Requirements

  • Understanding of C++ programming concepts
  • Access to NVIDIA HPC SDK(optional)

Key Questions Answered

What is the stdexec library and how does it enhance C++ programming?
The stdexec library is designed to standardize asynchronous programming in C++. It allows developers to write high-level algorithmic code that is independent of CPU or GPU, improving productivity and enabling better resource utilization and performance in applications.
How does the new multi-node functionality in cuSOLVER and cuFFT improve application performance?
The multi-node functionality in cuSOLVER and cuFFT enables applications to scale to thousands of GPUs with minimal code changes. This integration has been demonstrated in GROMACS, allowing it to compute multiple Particle-Mesh Ewald ranks, significantly enhancing scalability and performance.
What performance improvements were observed in GROMACS with the new SDK?
The integration of multi-node functionality in GROMACS improved scalability from 2 to 32 nodes, resulting in a substantial boost in performance. This was tested on the NVIDIA Selene cluster using 4 A100-SXM4 GPUs per node.

Key Statistics & Figures

Scalability improvement in GROMACS
From 2 to 32 nodes
This improvement was demonstrated using the NVIDIA Selene cluster with 4 A100-SXM4 GPUs per node.

Technologies & Tools

Software
Nvidia Hpc SDK
A comprehensive suite for high-performance computing development.
Library
Cusolver
Provides multi-node functionality for scalable applications.
Library
Cufft
Enables multi-node FFT computations for enhanced performance.

Key Actionable Insights

1
Leverage the stdexec library to enhance your C++ applications with asynchronous programming capabilities.
Using stdexec allows for better resource management and performance optimization in high-performance computing applications, making it a valuable addition to your development toolkit.
2
Integrate cuSOLVER and cuFFT multi-node functionalities into your applications to achieve significant performance gains.
This integration simplifies the process of scaling applications to utilize thousands of GPUs, which is crucial for handling large-scale computations efficiently.
3
Utilize the performance metrics provided by GROMACS to schedule and optimize your HPC workloads.
Understanding the ns/day metric helps in planning computational tasks effectively, ensuring that you maximize the use of available resources.

Common Pitfalls

1
Failing to leverage the capabilities of the stdexec library can lead to suboptimal resource utilization.
Many developers may stick to traditional parallel algorithms without exploring asynchronous programming, which can hinder performance improvements in HPC applications.

Related Concepts

Asynchronous Programming In C++
High-performance Computing
Multi-node Scaling Techniques
Parallel Programming Standards