Optimizing VK/VKR and DX12/DXR Applications Using Nsight Graphics: GPU Trace Advanced Mode Metrics

Many GPU performance analysis tools are based on a capture and replay mechanism, where a frame is first captured (either in-memory or to disk)…

Louis Bavoil
11 min readadvanced
--
View Original

Overview

The article discusses optimizing Vulkan (VK) and DirectX 12 (DX12) applications using Nsight Graphics, focusing on the new Advanced Mode metrics introduced in version 2020.2. It highlights how these metrics can help developers analyze GPU performance in real-time without the constraints of replay-based profiling.

What You'll Learn

1

How to capture GPU Trace data using Advanced Mode Metrics

2

Why using the P3 method is essential for optimizing GPU workloads

3

How to analyze GPU performance metrics to identify bottlenecks

4

When to implement mipmapping to improve texture fetch efficiency

Prerequisites & Requirements

  • Understanding of Vulkan and DirectX 12 graphics APIs
  • Familiarity with Nsight Graphics(optional)

Key Questions Answered

What are the new Advanced Mode metrics in Nsight Graphics 2020.2?
The Advanced Mode metrics include SM Warp-Issue-Stall Reasons, SM Warp-Launch-Stall Reasons, L1TEX Hit Rate, and L2 Traffic Breakdown by Source Unit. These metrics provide insights into GPU performance, helping developers identify issues related to warp latencies and memory traffic.
How can the P3 method be applied to optimize GPU workloads?
The P3 method involves analyzing the GPU Active% metric, examining top throughput metrics per GPU unit, and identifying performance bottlenecks. By following this method, developers can systematically address issues that limit GPU performance, such as memory latency or occupancy.
What steps are involved in taking GPU Trace captures?
To take GPU Trace captures, launch Nsight Graphics, create a project, connect to the application, select Advanced Mode Metrics, and then trigger a capture while ensuring the application is in fullscreen mode. This process allows for multiple captures to analyze performance changes.
What optimization was implemented to improve the RT Reflections workload?
The optimization involved implementing dynamic MIP level calculations in the hit shaders instead of hardcoding MIP=0. This change improved the L2 Read Hit Rate from 50% to 83%, resulting in a 12% performance gain in the RT Reflections workload.

Key Statistics & Figures

L2 Read Hit Rate from L1TEX
83%
Improved from 50% after implementing dynamic MIP LOD.
L1TEX Sector Hit Rate
80%
Slightly improved from 75% due to mipmapping.
RT Reflections workload time
4.64 ms
Reduced from 5.18 ms after optimizations.

Technologies & Tools

Tool
Nsight Graphics
Used for capturing and analyzing GPU performance metrics.
API
Vulkan
Supported by Nsight Graphics for performance analysis.
API
Directx 12
Supported by Nsight Graphics for performance analysis.

Key Actionable Insights

1
Utilize Advanced Mode Metrics in Nsight Graphics to gain deeper insights into GPU performance.
By capturing additional metrics, developers can identify specific performance bottlenecks and make informed decisions on optimizations.
2
Implement mipmapping to reduce VRAM access and improve texture fetch efficiency.
Mipmapping allows for better locality of texture accesses, which can significantly enhance performance, especially in workloads that are VRAM latency-limited.
3
Regularly analyze GPU Active% to ensure the GPU is not being starved by CPU-side limitations.
If GPU Active% drops, it may indicate that the CPU is not feeding the GPU efficiently, prompting a need to investigate CPU performance or API call overhead.
4
Apply the P3 method systematically to diagnose and resolve performance issues in GPU workloads.
This structured approach helps developers pinpoint the exact cause of performance drops and implement targeted optimizations.

Common Pitfalls

1
Failing to pause the game time and freeze rendering during GPU Trace captures can lead to inconsistent data.
Without freezing the rendering, frame-to-frame differences can obscure the performance metrics, making it difficult to identify true bottlenecks.
2
Not utilizing performance markers in the application can limit the effectiveness of GPU Trace.
Performance markers help in annotating GPU workloads, allowing for more granular analysis and insights during profiling.

Related Concepts

GPU Performance Optimization Techniques
Understanding Vulkan And Directx 12 Apis
Advanced Profiling Tools And Methodologies