Scalable GPU-Accelerated Supercomputer in the Microsoft Azure Cloud

Nefi Alarcon

At Supercomputing 2019 in Denver, Colorado, NVIDIA announced the availability of a new kind of GPU-accelerated supercomputer in the cloud on Microsoft Azure.

NVIDIA

•

Nefi Alarcon

•2 min read•advanced•

--

•View Original

AzureBERTHelmKubernetesPyTorchTensorFlow

Overview

NVIDIA announced a new GPU-accelerated supercomputer available on Microsoft Azure, designed for demanding AI and high-performance computing applications. The NDv2 instance offers significant performance and cost advantages over traditional CPU-based systems, enabling rapid deployment and scaling for complex workloads.

What You'll Learn

1

How to leverage NDv2 instances for training AI models efficiently

2

Why GPU acceleration is crucial for high-performance computing applications

3

When to use multiple NDv2 instances for complex HPC workloads

Prerequisites & Requirements

Understanding of AI and high-performance computing concepts(optional)
Familiarity with NVIDIA CUDA X library and deep learning frameworks(optional)

Key Questions Answered

What are the capabilities of the NDv2 instance in Azure?

The NDv2 instance can host up to 800 NVIDIA V100 Tensor Core GPUs interconnected via a Mellanox InfiniBand backend network, making it suitable for demanding AI and HPC applications. It allows users to rent an entire AI supercomputer on demand, providing performance comparable to large-scale, on-premises supercomputers.

How does the NDv2 instance improve AI model training times?

Using 64 NDv2 instances, engineers trained the BERT conversational AI model in approximately three hours. This rapid training is facilitated by multi-GPU optimizations from the NVIDIA CUDA X library and high-speed Mellanox interconnects, significantly reducing the time required compared to traditional methods.

What performance advantages do NDv2 instances offer over traditional HPC nodes?

A single NDv2 instance can deliver results an order of magnitude faster than traditional HPC nodes without GPU acceleration for specific applications like deep learning. This performance can scale linearly up to a hundred instances for large-scale simulations, making it highly efficient for complex workloads.

Key Statistics & Figures

Number of NVIDIA V100 Tensor Core GPUs per NDv2 instance

8

Each NDv2 instance can be clustered to meet various workload demands.

Time to train BERT model using NDv2 instances

3 hours

Achieved using 64 NDv2 instances on a pre-release version of the cluster.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Hardware

Nvidia V100 Tensor Core Gpus

Used for GPU acceleration in AI and HPC applications.

Networking

Mellanox Infiniband

Provides high-speed interconnects for NDv2 instances.

Software

Nvidia Cuda X Library

Facilitates multi-GPU optimizations for improved performance.

Software

Tensorflow

A deep learning framework supported on NDv2 instances.

Software

Pytorch

Another deep learning framework compatible with NDv2 instances.

Software

Mxnet

A deep learning framework available for use on NDv2 instances.

Key Actionable Insights

1
Utilize NDv2 instances to accelerate AI model training and HPC workloads.
By leveraging the GPU acceleration of NDv2 instances, teams can significantly reduce the time required for training complex models, enabling faster iterations and improved productivity.

2
Consider using multiple NDv2 instances for large-scale simulations.
Scaling to a hundred instances allows for efficient handling of extensive computational tasks, which is particularly beneficial in fields like drug development and materials science.

3
Explore the NVIDIA NGC container registry for optimized software solutions.
The registry provides access to GPU-optimized applications and frameworks, streamlining the deployment process and ensuring compatibility with NDv2 instances.

Common Pitfalls

1

Underestimating the setup time for deploying NDv2 instances.

While NDv2 instances offer rapid deployment, users should be aware that initial configuration and optimization may still require time and expertise, particularly for complex workloads.