NVIDIA Releases Updates and New Features in CUDA&#x2d;X AI Software

Siddharth Sharma

Learn what’s new CUDA-X AI— a deep learning software stack for researchers and developers to build GPU-accelerated applications.

NVIDIA

•

Siddharth Sharma

•4 min read•intermediate•

--

•View Original

BERTDeep LearningHelmKubernetesPythonPyTorchTensorFlowTransformer

Overview

NVIDIA has released updates and new features in its CUDA-X AI software stack, designed for building high-performance GPU-accelerated applications in areas like conversational AI, recommendation systems, and computer vision. Key updates include enhancements to NVIDIA Triton Inference Server, TensorRT 8.0, NVIDIA NeMo, and NVIDIA Maxine, along with updates to the NGC catalog.

What You'll Learn

1

How to utilize Business Logic Scripting in NVIDIA Triton Inference Server

2

Why to implement Quantization Aware Training for achieving FP32 accuracy with INT8 precision

3

When to apply NVIDIA Maxine's Virtual Background feature for enhanced video quality

Key Questions Answered

What are the new features in NVIDIA Triton Inference Server?

The latest updates to NVIDIA Triton Inference Server include Business Logic Scripting (Beta) for calling other models within a Python model, a Container Composition Utility for creating custom Triton containers, and two new GPU-enabled containers for TensorFlow and PyTorch backends.

How does TensorRT 8.0 improve deep learning inference?

TensorRT 8.0 introduces significant improvements such as BERT-Large inference in just 1.2 ms using new Transformer Optimizations, the ability to achieve FP32 accuracy with INT8 precision through Quantization Aware Training, and support for sparsity to enhance inference speed on Ampere GPUs.

What enhancements does NVIDIA Maxine offer for video effects?

NVIDIA Maxine provides several enhancements including a Virtual Background feature for improved stream quality, Super Resolution support for up to 4K video input, and advanced audio effects like Noise Removal and Room Echo Cancellation, all aimed at enhancing virtual collaboration experiences.

Key Statistics & Figures

BERT-Large Inference Time

1.2 ms

This performance is achieved through new Transformer Optimizations in TensorRT 8.0.

Technologies & Tools

Inference Serving Software

Nvidia Triton Inference Server

Used for deploying AI models in production environments.

Deep Learning Inference Platform

Tensorrt

Optimizes deep learning models for high-performance inference.

Toolkit

Nvidia Nemo

Develops state-of-the-art conversational AI models.

SDK

Nvidia Maxine

Provides AI-based features for video and audio effects.

Key Actionable Insights

1
Leverage the Business Logic Scripting feature in NVIDIA Triton to enhance model interoperability.
This feature allows developers to create more complex AI workflows by enabling models to call each other, which can significantly improve the efficiency of AI applications.

2
Utilize TensorRT's Quantization Aware Training to optimize model performance without sacrificing accuracy.
By implementing this technique, developers can achieve the same level of accuracy as FP32 while benefiting from faster inference times, which is crucial for real-time applications.

3
Incorporate NVIDIA Maxine's Super Resolution feature to enhance video quality in applications.
This feature is particularly useful for applications that require high-definition video streams, ensuring a better user experience in virtual meetings and content creation.