Measuring the GPU Occupancy of Multi-stream Workloads

NVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM)…

Rob Van der Wijngaart
10 min readadvanced
--
View Original

Overview

This article discusses how to measure GPU occupancy for multi-stream workloads using NVIDIA's Nsight Systems tool. It highlights the importance of concurrency in maximizing GPU resource utilization and provides insights into calculating GPU metrics such as SM Active and overall GPU utilization.

What You'll Learn

1

How to use Nsight Systems to analyze GPU occupancy

2

Why measuring SM Active is crucial for optimizing GPU performance

3

How to extract GPU metrics using SQL from Nsight Systems reports

Prerequisites & Requirements

  • Understanding of GPU architecture and CUDA programming
  • Familiarity with Nsight Systems and SQLite(optional)

Key Questions Answered

How can I determine the GPU occupancy of multi-stream workloads?
You can determine GPU occupancy by using the NVIDIA Nsight Systems tool to analyze the execution of kernels across multiple streams. By examining the timelines and utilizing the GPU Metrics feature, you can assess how effectively the GPU resources are being utilized during the workload execution.
What is SM Active and why is it important?
SM Active refers to the percentage of Streaming Multiprocessors (SMs) that are in use during a workload. It is important because it provides insight into how well the GPU is being utilized, indicating whether the workload is effectively saturating the GPU's computational resources.
How do I extract GPU metrics from Nsight Systems reports?
You can extract GPU metrics from Nsight Systems reports by using SQL queries on the SQLite database generated during profiling. This allows you to access detailed performance data, including SM Active percentages, which can be crucial for performance analysis.
What are the different types of GPU utilization metrics?
The article discusses three types of GPU utilization metrics: gross GPU utilization, which averages SM activity over the entire workload; net GPU utilization, which focuses on periods of active kernel execution; and effective GPU utilization time, which estimates the total time the GPU would have been fully utilized based on sample data.

Key Statistics & Figures

Number of SMs in NVIDIA GPUs
80, 108, and 132
These numbers correspond to the NVIDIA Volta, Ampere, and Hopper architectures, respectively.
Default sampling frequency for SM Active
10 KHz
This frequency is used when profiling GPU metrics in Nsight Systems.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Performance Analysis Tool
Nvidia Nsight Systems
Used to analyze GPU occupancy and performance metrics for workloads.
Programming Model
Cuda
Utilized for developing applications that run on NVIDIA GPUs.
Database
Sqlite
Used to store and query performance data extracted from Nsight Systems reports.

Key Actionable Insights

1
Utilize the Nsight Systems tool to analyze your multi-stream workloads for better GPU occupancy.
By visualizing the execution timelines of different streams, you can identify overlaps and optimize the workload distribution across the GPU's SMs, leading to improved performance.
2
Leverage SQL queries to extract detailed GPU metrics from Nsight Systems reports.
This allows for a deeper analysis of GPU performance, enabling you to make data-driven decisions to enhance your application's efficiency.
3
Monitor the SM Active metric to gauge GPU utilization effectively.
Understanding this metric helps in identifying bottlenecks in your workload, allowing for adjustments that can lead to better resource utilization.

Common Pitfalls

1
Failing to account for periods of inactivity in GPU utilization metrics.
This can lead to misleading interpretations of performance, as high SM utilization does not necessarily indicate efficient workload execution.
2
Overlooking the importance of kernel concurrency in maximizing GPU performance.
Without sufficient concurrency, the GPU may not be fully utilized, resulting in underperformance despite high SM activity.

Related Concepts

Cuda Programming
GPU Architecture
Performance Optimization Techniques