New on NGC: Latest Versions of NeMo, HPC SDK, DOCA, PyTorch Lightning, and More

Chintan Patel

Learn about the latest additions and software updates to the NVIDIA NGC catalog, a hub of GPU-optimized software that simplifies and accelerates workflows.

NVIDIA

•

Chintan Patel

•3 min read•intermediate•

--

•View Original

AWSDeep LearningHelmKubernetesPyTorchTensorFlow

Overview

The article highlights the latest updates in the NVIDIA NGC catalog, focusing on new versions of NVIDIA NeMo, HPC SDK, DOCA, PyTorch Lightning, and more. It emphasizes the enhancements in these tools aimed at improving productivity and performance for developers working on AI, ML, and HPC applications.

What You'll Learn

1

How to use NVIDIA NeMo for building conversational AI models

2

Why to leverage the NVIDIA HPC SDK for optimizing HPC applications

3

How to deploy applications using NVIDIA DOCA on BlueField DPUs

4

How to implement Fully Sharded Parallelism in PyTorch Lightning

5

When to use NVIDIA Magnum IO for scaling applications

Prerequisites & Requirements

Understanding of AI and ML concepts
Familiarity with NVIDIA NGC catalog and its offerings(optional)

Key Questions Answered

What are the new features in NVIDIA NeMo's latest version?

The latest version of NVIDIA NeMo includes support for Conformer ONNX conversion, streaming inference of long AU files, and improved performance for speaker clustering, verification, and diarization. It also introduces multiple datasets, right to left models, and enhancements for NMT training efficiency.

How does the NVIDIA HPC SDK enhance performance for HPC applications?

The NVIDIA HPC SDK provides full support for the NVIDIA Arm HPC Developer Kit and CUDA 11.4, along with HPC compilers that include Arm-specific performance enhancements, improved vectorization, and optimized math functions, maximizing developer productivity.

What capabilities does the NVIDIA DOCA SDK provide for developers?

The NVIDIA DOCA SDK enables developers to rapidly create applications on BlueField DPUs, with resources for deploying applications based on Kubernetes, including ready-to-use .yaml configuration files for various DOCA containers.

What updates were made to PyTorch Lightning in version 1.4.0?

PyTorch Lightning v1.4.0 adds support for Fully Sharded Parallelism, allowing larger models to fit into memory across multiple GPUs, reaching over 40 billion parameters on an A100. It also introduces support for the new DeepSpeed Infinity plug-in.

Key Statistics & Figures

Maximum model size supported by PyTorch Lightning

over 40 billion parameters

This capability is achieved when using the A100 GPU, allowing for the training of significantly larger models.

Technologies & Tools

AI/ML Framework

Nvidia Nemo

Used for building conversational AI models.

Software Development Kit

Nvidia Hpc SDK

Provides tools and libraries for HPC application development.

Software Development Kit

Nvidia Doca

Enables application development on BlueField DPUs.

AI/ML Framework

Pytorch Lightning

Facilitates model training at scale with advanced optimizations.

I/O Technology

Nvidia Magnum Io

Supports I/O subsystem technologies for modern data centers.

Key Actionable Insights

1
Utilize NVIDIA NeMo for building conversational AI applications to streamline your development process.
NVIDIA NeMo's modular design allows for easy integration of various components, making it ideal for data scientists and researchers looking to create state-of-the-art speech and NLP networks.

2
Incorporate the NVIDIA HPC SDK into your workflow to enhance the performance of your HPC applications.
With support for the latest CUDA version and Arm-specific optimizations, the HPC SDK can significantly improve the efficiency and portability of your applications.

3
Leverage the NVIDIA DOCA SDK to build applications that take full advantage of BlueField DPUs.
The DOCA SDK simplifies the development process by providing essential tools and resources for deploying applications in a Kubernetes environment.

4
Adopt PyTorch Lightning to scale your model training without altering your existing codebase.
The framework's advanced training optimizations can significantly reduce the time and effort required to train large models, especially in multi-GPU setups.

Related Concepts

AI And ML Frameworks

High Performance Computing (hpc)

Kubernetes For Application Deployment

Deep Learning Model Optimization Techniques