NVIDIA Announces Riva Speech AI and Large Language Modeling Software For Enterprise

At GTC, NVIDIA unveiled breakthroughs making it simpler for enterprise and research organizations to build state-of-the-art, customizable conversational AI.

Siddharth Sharma
3 min readintermediate
--
View Original

Overview

NVIDIA has introduced significant advancements in its Riva Speech AI and NeMo frameworks, enabling enterprises to develop high-quality speech and language models. Riva allows for the creation of custom voices in just 30 minutes of audio data, while NeMo supports the training of large-scale language models with trillions of parameters.

What You'll Learn

1

How to create a custom neural voice using NVIDIA Riva

2

Why NVIDIA Riva is suitable for large-scale speech AI deployments

3

How to train large language models using the NeMo framework

4

When to use NVIDIA Triton for real-time inference

Key Questions Answered

What capabilities does NVIDIA Riva provide for enterprises?
NVIDIA Riva offers a GPU-accelerated Speech AI SDK that enables enterprises to generate expressive human-like speech and create custom neural voices using just 30 minutes of audio data. It supports large-scale deployments and provides fine-grained control over voice generation.
How does the NeMo framework enhance language model training?
The NeMo framework allows enterprises to build and customize large language models with trillions of parameters, leveraging advancements from the Megatron project. It supports training across multiple GPUs and nodes, enabling efficient deployment with NVIDIA Triton for real-time inference.
What are the performance improvements of Riva on A100 compared to V100?
Riva achieves 12x higher performance using Fastpitch + HiFiGAN on the A100 compared to Tacotron2 + WaveGlow on the V100. This significant improvement allows for more efficient processing and better scalability for real-time applications.
What languages does Riva support for speech recognition?
Riva provides world-class speech recognition capabilities and supports five different languages, making it versatile for global applications in various industries.

Key Statistics & Figures

Performance improvement
12x higher performance
Achieved with Fastpitch + HiFiGAN on A100 compared to Tacotron2 + WaveGlow on V100.
Voice creation time
30 minutes
Time required to create a new neural voice using NVIDIA Riva.

Technologies & Tools

Speech AI SDK
Nvidia Riva
Used for generating expressive human-like speech and creating custom voices.
Training Framework
Nvidia Nemo
Facilitates the development of large-scale language models.
Inference Server
Nvidia Triton
Enables real-time inference across multiple GPUs and nodes.
Open-source Project
Megatron
Used for efficiently training large language models.

Key Actionable Insights

1
Leverage NVIDIA Riva's custom voice capability to enhance brand identity.
By creating a unique voice for your brand in just 30 minutes of audio data, you can improve user engagement and recognition in virtual assistants and other applications.
2
Utilize the NeMo framework for developing large-scale language models tailored to specific domains.
This allows enterprises to customize models like Megatron 530B for their unique needs, enhancing the relevance and accuracy of language processing tasks.
3
Implement NVIDIA Triton for deploying models across multiple GPUs for real-time inference.
This approach ensures that your applications can handle high loads and provide quick responses, which is critical for user-facing services.

Common Pitfalls

1
Underestimating the time required to create a custom voice.
While it takes only 30 minutes of audio data, ensuring quality and expressiveness in the generated voice may require additional fine-tuning and testing.

Related Concepts

Speech AI
Large Language Models
Conversational AI
Real-time Inference