ICYMI: New AI Tools and Technologies Announced at GTC 2021 Keynote

At GTC 2021, NVIDIA announced new software tools to help developers build optimized conversational AI, recommender, and video solutions.

Siddharth Sharma
7 min readintermediate
--
View Original

Overview

At GTC 2021, NVIDIA unveiled new AI tools and technologies aimed at enhancing conversational AI, recommender systems, and video solutions. Key announcements included NVIDIA Riva for conversational AI, the TAO framework for AI application development, and updates to Triton Inference Server and TensorRT for optimized inference.

What You'll Learn

1

How to utilize NVIDIA Riva for building conversational AI applications

2

Why NVIDIA TAO Framework accelerates AI application development

3

How to implement video effects using NVIDIA Maxine SDK

4

When to use Triton Inference Server for model deployment

5

How to optimize deep learning inference with TensorRT 8.0

Key Questions Answered

What capabilities does NVIDIA Riva provide for conversational AI?
NVIDIA Riva offers automatic speech recognition with over 90% accuracy, real-time translation for five languages with under 100ms latency, and expressive text-to-speech capabilities that deliver 30x higher throughput compared to Tacotron2.
How does the NVIDIA TAO Framework simplify AI application development?
The NVIDIA TAO Framework allows enterprises to fine-tune pretrained models quickly, reducing the time to create domain-specific models from months to hours, thus eliminating the need for extensive training runs and deep AI expertise.
What are the key features of NVIDIA Maxine SDK?
NVIDIA Maxine SDK includes a Video Effects SDK for super resolution and noise removal, an Augmented Reality SDK for 3D effects, and an Audio Effects SDK for high-quality noise and echo removal, all optimized for high performance on GPUs.
What improvements were made in Triton Inference Server 2.9?
Triton Inference Server 2.9 introduces Model Navigator for automatic model conversion and validation, Model Analyzer for optimizing batch sizes and model instances, and supports OpenVINO backend for high-performance inference on CPUs.
What performance enhancements does TensorRT 8.0 offer?
TensorRT 8.0 provides up to 2x faster inference with INT8 precision while maintaining FP32 accuracy, supports sparsity for higher throughput on Ampere GPUs, and includes optimizations for transformer-based networks like BERT.

Key Statistics & Figures

Accuracy of speech recognition model in NVIDIA Riva
greater than 90%
This accuracy is achieved with an out-of-the-box model trained on multiple large corpuses.
Latency for real-time translation in NVIDIA Riva
under 100ms per sentence
This low latency is crucial for effective real-time communication in multilingual applications.
Throughput improvement of text-to-speech in NVIDIA Riva
30x higher compared to Tacotron2
This significant increase in throughput allows for more efficient processing of speech synthesis tasks.
Speedup factor for AI development with NVIDIA TAO
over 10X
This speedup is achieved by using NVIDIA's pretrained models and Transfer Learning Toolkit.
Inference speed improvement for transformer-based networks with TensorRT 8.0
up to 2x faster
This improvement is particularly beneficial for applications utilizing models like BERT.

Technologies & Tools

AI/ML
Nvidia Riva
Used for building conversational AI applications.
AI/ML
Nvidia Tao
Framework for accelerating AI application development.
AI/ML
Nvidia Maxine
SDK for developing virtual collaboration and content creation applications.
AI/ML
Nvidia Triton Inference Server
Open source inference serving software for model deployment.
AI/ML
Tensorrt
Deep learning inference optimizer and runtime.
AI/ML
Nvidia Merlin
Framework for developing deep learning recommender systems.
AI/ML
Nvidia Deepstream
Toolkit for building video analytics applications.

Key Actionable Insights

1
Leverage NVIDIA Riva to enhance customer engagement through conversational AI applications.
By utilizing Riva's speech recognition and translation capabilities, developers can create more interactive and responsive applications that cater to diverse user needs.
2
Utilize the NVIDIA TAO Framework to accelerate your AI development process.
TAO's ability to fine-tune pretrained models allows teams to focus on application-specific features rather than spending excessive time on training, leading to faster deployment.
3
Incorporate NVIDIA Maxine SDK into your video conferencing applications for superior quality.
Maxine's advanced video and audio effects can significantly improve user experience in virtual collaboration tools, making them more appealing and effective.
4
Adopt Triton Inference Server for scalable AI model deployment.
Triton's features for automatic model conversion and performance optimization make it an ideal choice for organizations looking to deploy AI solutions efficiently.
5
Implement TensorRT 8.0 in your deep learning projects for optimized inference performance.
With its enhanced speed and efficiency, TensorRT 8.0 can help developers achieve better performance in AI applications, particularly those involving complex models.