Latest Updates to NVIDIA CUDA-X AI Libraries

Learn what’s new in the latest releases of NVIDIA’s CUDA-X AI libraries and NGC. For more information on NVIDIA’s developer tools, join live webinars, training…

Overview

The article discusses the latest updates to NVIDIA's CUDA-X AI libraries, highlighting enhancements in various components such as the NVIDIA Collective Communications Library, Triton Inference Server, Deep Learning Profiler, and DALI. It emphasizes performance optimizations and new features that support deep learning applications on NVIDIA GPUs.

What You'll Learn

1

How to utilize the NVIDIA Collective Communications Library for optimized multi-GPU communication

2

Why to implement dynamic batching in NVIDIA Triton Inference Server for better request management

3

How to visualize GPU utilization using the Deep Learning Profiler

4

How to optimize deep learning models using NVIDIA's DALI for data loading

Key Questions Answered

What are the key features of the NVIDIA Collective Communications Library 2.6?
The NVIDIA Collective Communications Library 2.6 includes features such as up to 2x peak bandwidth with in-network AllReduce operations, Infiniband adaptive routing to alleviate congested ports, and enhanced topology support for AMD, ARM, PCI Gen4, and IB HDR.
What improvements does the NVIDIA Triton Inference Server 20.03 offer?
The NVIDIA Triton Inference Server 20.03 introduces prioritization per request, experimental Python client support for GRPC inferencing API, and the ability to run large ONNX models with weights stored across separate files, enhancing model serving capabilities.
How can the Deep Learning Profiler assist in optimizing GPU utilization?
The Deep Learning Profiler provides visualization of GPU utilization and Tensor Core operations, along with expert system recommendations and support for user-defined NVTX markers, helping developers optimize their deep learning applications.
What optimizations are included in DALI 0.20 for deep learning applications?
DALI 0.20 includes optimizations for speech processing and augmentation operators, such as spectrogram and mel filterbank, which can significantly accelerate Automatic Speech Recognition (ASR) models like Jasper and RNN-T.

Key Statistics & Figures

Peak bandwidth improvement
Up to 2x
Achieved through in-network AllReduce operations in the NVIDIA Collective Communications Library.
Performance increase for CNNs
Up to 10%
Realized through the new layout optimization option for Automatic Mixed Precision in MXNet.

Technologies & Tools

Backend
Nvidia Collective Communications Library
Used for optimized multi-GPU and multi-node collective communication.
Backend
Nvidia Triton Inference Server
Open source inference serving software for deep learning models.
Tool
Deep Learning Profiler
Profiling application to visualize GPU utilization and Tensor Core operations.
Library
Dali
Library for GPU-accelerated decoding and augmentation of image/video in deep learning applications.

Key Actionable Insights

1
Utilizing the NVIDIA Collective Communications Library can significantly enhance multi-GPU performance in deep learning tasks.
By leveraging features like in-network AllReduce and adaptive routing, developers can optimize communication between GPUs, which is crucial for training large models efficiently.
2
Implementing dynamic batching in NVIDIA Triton Inference Server can lead to better resource utilization and faster inference times.
This feature allows for prioritization and management of requests, which is particularly beneficial in production environments where response times are critical.
3
Using the Deep Learning Profiler can provide insights into GPU performance bottlenecks.
By integrating with TensorBoard, developers can visualize and analyze GPU utilization, leading to more informed optimization decisions.