NVIDIA Slashes BERT Training and Inference Times

In today’s announcement, researchers and developers from NVIDIA set records in both training and inference of BERT, one of the most popular AI language models.

Overview

NVIDIA has announced significant advancements in training and inference times for BERT, a leading AI language model, enabling faster development of conversational AI applications. Utilizing NVIDIA DGX SuperPOD and T4 GPUs, training time has been reduced to 53 minutes, and inference latency has been cut to just 2 milliseconds, enhancing the practicality of deploying state-of-the-art language models.

What You'll Learn

1

How to leverage NVIDIA DGX SuperPOD for efficient BERT training

2

Why using TensorRT can drastically improve inference times for AI models

3

When to apply Automatic Mixed Precision in training deep learning models

Prerequisites & Requirements

  • Understanding of AI language models and their applications
  • Familiarity with NVIDIA GPUs and TensorRT(optional)

Key Questions Answered

What are the new training and inference times for BERT using NVIDIA technology?
NVIDIA has achieved a training time of just 53 minutes for BERT on a DGX SuperPOD and an inference time of 2 milliseconds on T4 GPUs. This represents a significant improvement over traditional CPU-only platforms, which can take much longer for both tasks.
How does NVIDIA's solution benefit conversational AI applications?
The advancements in training and inference times make it feasible for developers to deploy state-of-the-art language understanding models in large-scale production applications, enhancing the capabilities of chatbots, personal assistants, and search engines.
What hardware was used for the training of BERT?
Training was performed on an NVIDIA DGX SuperPOD, utilizing 1,472 V100 SXM3-32GB GPUs and 10 Mellanox Infiniband adapters per node, showcasing the scalability of NVIDIA's hardware for AI applications.
What is the significance of Automatic Mixed Precision in BERT training?
Automatic Mixed Precision was used to accelerate throughput during BERT training, allowing for faster computation and improved performance, which is crucial for handling large-scale models effectively.

Key Statistics & Figures

Training time for BERT-Large
53 minutes
Achieved on an NVIDIA DGX SuperPOD with 1,472 V100 GPUs.
Inference time for BERT
2 milliseconds
Measured using NVIDIA T4 GPUs, significantly faster than CPU-only platforms.
Size of NVIDIA's custom language model
8.3 billion parameters
This model is 24 times the size of BERT-Large, catering to the need for larger models in AI applications.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Hardware
Nvidia Dgx Superpod
Used for training BERT efficiently.
Hardware
Nvidia V100 Gpus
Utilized in the training process to achieve record times.
Hardware
Nvidia T4 Gpus
Employed for fast inference of BERT.
Software
Tensorrt
Optimizes inference performance for AI models.
Software
Pytorch
Framework used for training BERT.

Key Actionable Insights

1
Utilize NVIDIA DGX SuperPOD for training BERT to significantly reduce time from days to under an hour.
This is particularly beneficial for organizations looking to deploy conversational AI solutions quickly and efficiently, allowing for rapid iteration and deployment.
2
Implement TensorRT for inference to achieve latency under 10 milliseconds, ideal for real-time applications.
This optimization is crucial for applications like chatbots and personal assistants that require immediate responses to user queries.
3
Explore the use of larger models, such as NVIDIA's custom model with 8.3 billion parameters, for more complex language tasks.
As conversational AI evolves, leveraging larger models can improve understanding and response quality, making applications more effective.

Common Pitfalls

1
Neglecting the importance of hardware optimization can lead to suboptimal performance in AI applications.
Using inadequate hardware for training and inference can significantly increase processing times and reduce the effectiveness of AI models.

Related Concepts

Conversational AI
Natural Language Processing
Large Language Models
Tensorrt Optimization