Developing Accelerated Code with Standard Language Parallelism

Learn how standard language parallelism can be used for programming accelerated computing applications on NVIDIA GPUs with ISO C++, ISO Fortran, or Python.

Jeff Larkin
11 min readintermediate
--
View Original

Overview

The article discusses the advantages of using standard language parallelism for accelerated computing on NVIDIA platforms, emphasizing the productivity and portability of programming with ISO C++, ISO Fortran, and Python. It highlights various approaches to GPU programming and showcases examples of applications that have successfully implemented these techniques.

What You'll Learn

1

How to implement parallel algorithms in ISO C++ for improved performance

2

Why using standard language parallelism can enhance code portability across platforms

3

When to choose ISO Fortran for high-performance computing applications

4

How to leverage cuNumeric to scale Python applications across GPUs

Prerequisites & Requirements

  • Familiarity with parallel programming concepts
  • Access to NVIDIA SDKs and libraries(optional)

Key Questions Answered

What are the benefits of using standard language parallelism in GPU programming?
Using standard language parallelism offers full ISO compliance, resulting in more portable code that is easier to read and maintain. It allows developers to write parallel code that can run on multiple platforms without modification, enhancing productivity and reducing errors.
How does the performance of ISO C++ compare to OpenMP in practical applications?
In the Lulesh application, the ISO C++ implementation achieved a performance improvement of over 13X when running on an NVIDIA A100 GPU compared to the original OpenMP code. This demonstrates the efficiency and effectiveness of using standard language parallelism for high-performance computing.
What role does cuNumeric play in Python applications for GPU acceleration?
cuNumeric allows Python applications, particularly those using NumPy, to scale their computations across GPUs and clusters seamlessly. By replacing NumPy with cuNumeric, applications can leverage GPU acceleration without significant code changes, enhancing performance dramatically.
What improvements have been made in Fortran for parallel programming?
Fortran has introduced features for parallel programming since Fortran 2008, with enhancements in Fortran 2018 and ongoing developments for Fortran 202X. These improvements facilitate modernizing applications to be parallel-first, allowing for better performance on multicore CPUs and GPUs.

Key Statistics & Figures

Performance improvement of ISO C++ code over OpenMP
13.5X
This performance was observed when running the Lulesh application on an NVIDIA A100 GPU compared to the original OpenMP implementation.
Performance improvement of STLBM using GPUs
12X
This improvement was achieved by using ISO C++ without external SDK dependencies, showcasing the effectiveness of standard language parallelism.

Technologies & Tools

Backend
Cuda C++
Used for high-performance computing on NVIDIA GPUs.
Backend
Fortran
Utilized for scientific computing and high-performance applications.
Library
Cunumeric
Enables GPU acceleration for Python applications modeled after NumPy.

Key Actionable Insights

1
Adopting ISO C++ parallel algorithms can significantly reduce code complexity and improve maintainability.
By refactoring existing code to use C++ parallel algorithms, developers can create cleaner, more efficient code that is easier to understand and maintain, leading to fewer errors and better performance.
2
Utilizing cuNumeric can help Python developers enhance the performance of their applications without extensive rewrites.
By switching from NumPy to cuNumeric, developers can achieve substantial performance gains on NVIDIA GPUs, making it a valuable tool for those looking to leverage GPU acceleration in Python.
3
Implementing standard language parallelism allows for code that is inherently portable across different hardware platforms.
This approach minimizes the need for platform-specific code, enabling developers to focus on the logic of their applications rather than the underlying hardware, which can lead to faster development cycles.

Common Pitfalls

1
Failing to adopt standard language parallelism can lead to code that is difficult to maintain and less portable.
Without using standard approaches, developers may end up with platform-specific code that requires significant effort to port to new hardware, increasing development time and complexity.

Related Concepts

Parallel Programming
GPU Acceleration
High-performance Computing
Iso C++
Iso Fortran