NVIDIA Slashes BERT Training and Inference Times

Nefi Alarcon

In today’s announcement, researchers and developers from NVIDIA set records in both training and inference of BERT, one of the most popular AI language models.

NVIDIA

•

Nefi Alarcon

•3 min read•advanced•

--

•View Original

BERTDeep LearningGoogle CloudHugging FacePyTorchTensorFlowTransformerTransformers

Overview

NVIDIA has announced significant advancements in training and inference times for BERT, a leading AI language model, enabling faster development of conversational AI applications. Utilizing NVIDIA DGX SuperPOD and T4 GPUs, training time has been reduced to 53 minutes, and inference latency has been cut to just 2 milliseconds, enhancing the practicality of deploying state-of-the-art language models.

What You'll Learn

1

How to leverage NVIDIA DGX SuperPOD for efficient BERT training

2

Why using TensorRT can drastically improve inference times for AI models

3

When to apply Automatic Mixed Precision in training deep learning models

Prerequisites & Requirements

Understanding of AI language models and their applications
Familiarity with NVIDIA GPUs and TensorRT(optional)

Key Questions Answered

What are the new training and inference times for BERT using NVIDIA technology?

NVIDIA has achieved a training time of just 53 minutes for BERT on a DGX SuperPOD and an inference time of 2 milliseconds on T4 GPUs. This represents a significant improvement over traditional CPU-only platforms, which can take much longer for both tasks.

How does NVIDIA's solution benefit conversational AI applications?

The advancements in training and inference times make it feasible for developers to deploy state-of-the-art language understanding models in large-scale production applications, enhancing the capabilities of chatbots, personal assistants, and search engines.

What hardware was used for the training of BERT?

Training was performed on an NVIDIA DGX SuperPOD, utilizing 1,472 V100 SXM3-32GB GPUs and 10 Mellanox Infiniband adapters per node, showcasing the scalability of NVIDIA's hardware for AI applications.

What is the significance of Automatic Mixed Precision in BERT training?

Automatic Mixed Precision was used to accelerate throughput during BERT training, allowing for faster computation and improved performance, which is crucial for handling large-scale models effectively.

Key Statistics & Figures

Training time for BERT-Large

53 minutes

Achieved on an NVIDIA DGX SuperPOD with 1,472 V100 GPUs.

Inference time for BERT

2 milliseconds

Measured using NVIDIA T4 GPUs, significantly faster than CPU-only platforms.

Size of NVIDIA's custom language model

8.3 billion parameters

This model is 24 times the size of BERT-Large, catering to the need for larger models in AI applications.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Hardware

Nvidia Dgx Superpod

Used for training BERT efficiently.

Hardware

Nvidia V100 Gpus

Utilized in the training process to achieve record times.

Hardware

Nvidia T4 Gpus

Employed for fast inference of BERT.

Software

Tensorrt

Optimizes inference performance for AI models.

Software

Pytorch

Framework used for training BERT.

Key Actionable Insights

1
Utilize NVIDIA DGX SuperPOD for training BERT to significantly reduce time from days to under an hour.
This is particularly beneficial for organizations looking to deploy conversational AI solutions quickly and efficiently, allowing for rapid iteration and deployment.

2
Implement TensorRT for inference to achieve latency under 10 milliseconds, ideal for real-time applications.
This optimization is crucial for applications like chatbots and personal assistants that require immediate responses to user queries.

3
Explore the use of larger models, such as NVIDIA's custom model with 8.3 billion parameters, for more complex language tasks.
As conversational AI evolves, leveraging larger models can improve understanding and response quality, making applications more effective.

Common Pitfalls

1

Neglecting the importance of hardware optimization can lead to suboptimal performance in AI applications.

Using inadequate hardware for training and inference can significantly increase processing times and reduce the effectiveness of AI models.

Related Concepts

Conversational AI

Natural Language Processing

Large Language Models

Tensorrt Optimization