NVIDIA Releases TensorRT 4

Today we are releasing TensorRT 4 with capabilities for accelerating popular inference applications such as neural machine translation…

Nefi Alarcon
1 min readbeginner
--
View Original

Overview

NVIDIA has released TensorRT 4, which enhances the acceleration of inference applications like neural machine translation, recommender systems, and speech. The new version offers significant performance improvements, including up to 45x higher throughput compared to CPU and 50x faster inference on V100 for ONNX models.

What You'll Learn

1

How to import models from popular deep learning frameworks using ONNX format

2

Why TensorRT 4 is beneficial for accelerating inference applications

3

When to use TensorRT for optimizing neural network inference

Key Questions Answered

What performance improvements does TensorRT 4 offer over CPU?
TensorRT 4 provides up to 45x higher throughput compared to CPU for certain applications, and it achieves 50x faster inference performance on the V100 GPU for ONNX models imported using the ONNX parser.
What types of applications can benefit from TensorRT 4?
TensorRT 4 is designed to accelerate popular inference applications such as neural machine translation, recommender systems, and speech recognition, making it suitable for a variety of AI-driven tasks.
How does TensorRT 4 support NVIDIA DRIVE Xavier?
TensorRT 4 includes support for NVIDIA DRIVE Xavier, which is an AI computer specifically designed for autonomous vehicles, allowing for enhanced inference capabilities in automotive applications.

Key Statistics & Figures

Throughput improvement
45x higher
Compared to CPU for certain applications
Inference performance on V100
50x faster
For ONNX models imported with ONNX parser in TensorRT
Inference speedup for FP16 custom layers
3x
When using APIs for Volta Tensor Cores

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Tensorrt
Used for accelerating inference applications
Format
Onnx
Facilitates model import from various deep learning frameworks
Hardware
Nvidia Drive Xavier
AI computer for autonomous vehicles supported by TensorRT 4

Key Actionable Insights

1
Leverage TensorRT 4 to significantly improve the performance of your AI applications.
With performance boosts of up to 45x compared to CPU, TensorRT 4 is ideal for developers looking to optimize their models for real-time inference.
2
Utilize the ONNX format for seamless model import from various deep learning frameworks.
This flexibility allows developers to transition their models easily into TensorRT 4, enhancing productivity and reducing time spent on model conversion.
3
Explore the new APIs for running FP16 custom layers on Volta Tensor Cores.
This can lead to a 3x inference speedup, making it crucial for developers aiming for high-performance AI solutions.