Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran…

Guray Ozen
13 min readadvanced
--
View Original

Overview

The article discusses how Fortran developers can accelerate their programs using the NVIDIA HPC SDK, specifically focusing on the DO CONCURRENT construct. It highlights the benefits of using ISO Standard Fortran for GPU acceleration without needing additional extensions or libraries.

What You'll Learn

1

How to use DO CONCURRENT for parallelism in Fortran programs

2

Why using ISO Standard Fortran can enhance portability across compilers

3

When to apply the -stdpar option in NVFORTRAN for GPU acceleration

4

How to optimize nested loops using DO CONCURRENT for better performance

Prerequisites & Requirements

  • Understanding of Fortran programming and parallelism concepts
  • NVIDIA HPC SDK installed on the system

Key Questions Answered

How does the NVIDIA HPC SDK accelerate Fortran DO CONCURRENT?
The NVIDIA HPC SDK accelerates Fortran DO CONCURRENT by allowing the NVFORTRAN compiler to automatically parallelize these loops for execution on NVIDIA GPUs. This means developers can write standard Fortran code without additional directives, and the compiler handles the necessary optimizations for performance.
What are the limitations of using DO CONCURRENT with NVFORTRAN?
The limitations include the prohibition of procedure calls within a DO CONCURRENT loop and the lack of support for reductions in the Fortran 2018 standard. These restrictions mean developers must ensure their loops are safe for parallel execution without these features.
What performance improvements can be expected when using DO CONCURRENT on NVIDIA GPUs?
Using DO CONCURRENT on NVIDIA GPUs can yield significant performance improvements, with the article noting an almost 13x performance increase when comparing a Jacobi solver running on a GPU versus a CPU with 40 cores. This demonstrates the efficiency of GPU acceleration for parallelizable tasks.
What command-line options are available for compiling Fortran programs with NVFORTRAN?
Developers can use the -stdpar option to enable GPU acceleration for DO CONCURRENT loops. Other options include -stdpar=multicore for targeting multi-core CPUs and -stdpar=gpu,multicore for creating programs that can run on either a CPU or GPU depending on availability.

Key Statistics & Figures

Performance improvement
13x
This performance gain is observed when comparing a Jacobi solver running on an NVIDIA A100 GPU to the same algorithm executed on a CPU with 40 cores.

Technologies & Tools

Software
Nvidia Hpc SDK
A suite of compilers, libraries, and tools for GPU-accelerating HPC applications.
Compiler
Nvfortran
The Fortran compiler included in the NVIDIA HPC SDK that supports GPU acceleration.

Key Actionable Insights

1
Utilize the DO CONCURRENT construct in your Fortran programs to express parallelism effectively.
This allows the NVFORTRAN compiler to optimize the execution of loops, leveraging the power of NVIDIA GPUs without needing to modify your existing code significantly.
2
Take advantage of the NVIDIA HPC SDK's capabilities to compile and run your Fortran applications on various architectures.
By using standard Fortran, you ensure that your applications remain portable and can benefit from performance enhancements across different systems.
3
Monitor the compiler's output using the -Minfo flag to understand how your code is being optimized.
This feedback can help you refine your code for better performance and identify potential issues with parallelization.

Common Pitfalls

1
Assuming that all loops can be safely parallelized without checking for data dependencies.
This can lead to race conditions and incorrect results. Developers must ensure that the iterations of the loop do not have dependencies that could affect the correctness of the output.

Related Concepts

GPU Acceleration In Fortran
Parallel Programming Techniques
Performance Optimization Strategies