Speeding Up Deep Learning Training with NVIDIA V100 Tensor Core GPUs in the AWS Cloud

Nefi Alarcon

Training deep learning models on NVIDIA GPUs is the gold standard in artificial intelligence, but the process can still take weeks to complete.

NVIDIA

•

Nefi Alarcon

•2 min read•intermediate•

--

•View Original

ApacheAWSAWS EC2Deep LearningResNetTensorFlow

Overview

This article discusses how to optimize deep learning training times using NVIDIA V100 Tensor Core GPUs in the AWS Cloud, reducing training durations from weeks to days. It highlights the use of distributed/multi-node synchronous training with specific frameworks and benchmarks the performance of different deep learning frameworks.

What You'll Learn

1

How to optimize deep learning training times using NVIDIA V100 Tensor Core GPUs

2

Why distributed/multi-node synchronous training is effective for deep learning

3

How to benchmark training times with ResNet-50 and the ImageNet dataset

Prerequisites & Requirements

Understanding of deep learning concepts and frameworks
Familiarity with AWS EC2 instances and NVIDIA GPUs(optional)

Key Questions Answered

How can deep learning training times be minimized in the AWS Cloud?

Deep learning training times can be minimized in the AWS Cloud by using distributed/multi-node synchronous training with NVIDIA V100 Tensor Core GPUs. The Amazon team demonstrated this by training a neural network in about 50 minutes using eight P3.16xlarge instances, significantly reducing training duration.

What frameworks were used to benchmark training times?

The benchmarks were conducted using Apache MXNet and TensorFlow with Horovod. The training times recorded were 47 minutes for Apache MXNet and 50 minutes for TensorFlow, showcasing the performance of these frameworks on NVIDIA V100 GPUs.

What is the achieved Top-1 validation accuracy for the frameworks used?

The achieved Top-1 validation accuracy was 75.75% for Apache MXNet and 75.54% for TensorFlow + Horovod. These metrics indicate the effectiveness of the training processes on the respective frameworks.

Key Statistics & Figures

Training time with Apache MXNet

47 minutes

Time taken to train the neural network using Apache MXNet on AWS infrastructure.

Training time with TensorFlow + Horovod

50 minutes

Time taken to train the neural network using TensorFlow with Horovod on AWS infrastructure.

Training throughput for Apache MXNet

~44,000 Images/Sec

The speed at which images were processed during training with Apache MXNet.

Training throughput for TensorFlow + Horovod

~41,000 Images/Sec

The speed at which images were processed during training with TensorFlow and Horovod.

Achieved Top-1 Validation Accuracy for Apache MXNet

75.75%

The accuracy achieved after training the model with Apache MXNet.

Achieved Top-1 Validation Accuracy for TensorFlow + Horovod

75.54%

The accuracy achieved after training the model with TensorFlow and Horovod.

Scaling Efficiency for Apache MXNet

92%

The efficiency of scaling the training process using Apache MXNet.

Scaling Efficiency for TensorFlow + Horovod

90%

The efficiency of scaling the training process using TensorFlow and Horovod.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Hardware

Nvidia V100 Tensor Core Gpus

Used for accelerating deep learning training processes.

Framework

Apache Mxnet

One of the frameworks used for training deep learning models.

Framework

Tensorflow

Another framework used for training deep learning models, specifically with Horovod.

Cloud Service

AWS EC2 P3 Instances

Infrastructure used for hosting the training processes.

Key Actionable Insights

1
Utilizing distributed/multi-node synchronous training can drastically reduce deep learning training times.
This approach allows developers to leverage multiple GPUs effectively, making it suitable for large datasets and complex models, particularly when time is a critical factor.

2
Benchmarking different frameworks can help identify the most efficient tools for specific deep learning tasks.
By comparing training times and accuracy metrics, developers can make informed decisions on which frameworks to adopt based on their project requirements.

Common Pitfalls

1

Neglecting to benchmark different frameworks can lead to suboptimal performance.

Without proper benchmarking, developers may miss out on using the most efficient tools for their specific deep learning tasks, resulting in longer training times and lower accuracy.