New on NGC: Latest Versions of NeMo, HPC SDK, DOCA, PyTorch Lightning, and More

Learn about the latest additions and software updates to the NVIDIA NGC catalog, a hub of GPU-optimized software that simplifies and accelerates workflows.

Chintan Patel
3 min readintermediate
--
View Original

Overview

The article highlights the latest updates in the NVIDIA NGC catalog, focusing on new versions of NVIDIA NeMo, HPC SDK, DOCA, PyTorch Lightning, and more. It emphasizes the enhancements in these tools aimed at improving productivity and performance for developers working on AI, ML, and HPC applications.

What You'll Learn

1

How to use NVIDIA NeMo for building conversational AI models

2

Why to leverage the NVIDIA HPC SDK for optimizing HPC applications

3

How to deploy applications using NVIDIA DOCA on BlueField DPUs

4

How to implement Fully Sharded Parallelism in PyTorch Lightning

5

When to use NVIDIA Magnum IO for scaling applications

Prerequisites & Requirements

  • Understanding of AI and ML concepts
  • Familiarity with NVIDIA NGC catalog and its offerings(optional)

Key Questions Answered

What are the new features in NVIDIA NeMo's latest version?
The latest version of NVIDIA NeMo includes support for Conformer ONNX conversion, streaming inference of long AU files, and improved performance for speaker clustering, verification, and diarization. It also introduces multiple datasets, right to left models, and enhancements for NMT training efficiency.
How does the NVIDIA HPC SDK enhance performance for HPC applications?
The NVIDIA HPC SDK provides full support for the NVIDIA Arm HPC Developer Kit and CUDA 11.4, along with HPC compilers that include Arm-specific performance enhancements, improved vectorization, and optimized math functions, maximizing developer productivity.
What capabilities does the NVIDIA DOCA SDK provide for developers?
The NVIDIA DOCA SDK enables developers to rapidly create applications on BlueField DPUs, with resources for deploying applications based on Kubernetes, including ready-to-use .yaml configuration files for various DOCA containers.
What updates were made to PyTorch Lightning in version 1.4.0?
PyTorch Lightning v1.4.0 adds support for Fully Sharded Parallelism, allowing larger models to fit into memory across multiple GPUs, reaching over 40 billion parameters on an A100. It also introduces support for the new DeepSpeed Infinity plug-in.

Key Statistics & Figures

Maximum model size supported by PyTorch Lightning
over 40 billion parameters
This capability is achieved when using the A100 GPU, allowing for the training of significantly larger models.

Technologies & Tools

AI/ML Framework
Nvidia Nemo
Used for building conversational AI models.
Software Development Kit
Nvidia Hpc SDK
Provides tools and libraries for HPC application development.
Software Development Kit
Nvidia Doca
Enables application development on BlueField DPUs.
AI/ML Framework
Pytorch Lightning
Facilitates model training at scale with advanced optimizations.
I/O Technology
Nvidia Magnum Io
Supports I/O subsystem technologies for modern data centers.

Key Actionable Insights

1
Utilize NVIDIA NeMo for building conversational AI applications to streamline your development process.
NVIDIA NeMo's modular design allows for easy integration of various components, making it ideal for data scientists and researchers looking to create state-of-the-art speech and NLP networks.
2
Incorporate the NVIDIA HPC SDK into your workflow to enhance the performance of your HPC applications.
With support for the latest CUDA version and Arm-specific optimizations, the HPC SDK can significantly improve the efficiency and portability of your applications.
3
Leverage the NVIDIA DOCA SDK to build applications that take full advantage of BlueField DPUs.
The DOCA SDK simplifies the development process by providing essential tools and resources for deploying applications in a Kubernetes environment.
4
Adopt PyTorch Lightning to scale your model training without altering your existing codebase.
The framework's advanced training optimizations can significantly reduce the time and effort required to train large models, especially in multi-GPU setups.

Related Concepts

AI And ML Frameworks
High Performance Computing (hpc)
Kubernetes For Application Deployment
Deep Learning Model Optimization Techniques