NVIDIA Releases TensorRT 4

Nefi Alarcon

Today we are releasing TensorRT 4 with capabilities for accelerating popular inference applications such as neural machine translation…

NVIDIA

•

Nefi Alarcon

•1 min read•beginner•

--

•View Original

Neural NetworksPyTorchRecurrent Neural Networks

Overview

NVIDIA has released TensorRT 4, which enhances the acceleration of inference applications like neural machine translation, recommender systems, and speech. The new version offers significant performance improvements, including up to 45x higher throughput compared to CPU and 50x faster inference on V100 for ONNX models.

What You'll Learn

1

How to import models from popular deep learning frameworks using ONNX format

2

Why TensorRT 4 is beneficial for accelerating inference applications

3

When to use TensorRT for optimizing neural network inference

Key Questions Answered

What performance improvements does TensorRT 4 offer over CPU?

TensorRT 4 provides up to 45x higher throughput compared to CPU for certain applications, and it achieves 50x faster inference performance on the V100 GPU for ONNX models imported using the ONNX parser.

What types of applications can benefit from TensorRT 4?

TensorRT 4 is designed to accelerate popular inference applications such as neural machine translation, recommender systems, and speech recognition, making it suitable for a variety of AI-driven tasks.

How does TensorRT 4 support NVIDIA DRIVE Xavier?

TensorRT 4 includes support for NVIDIA DRIVE Xavier, which is an AI computer specifically designed for autonomous vehicles, allowing for enhanced inference capabilities in automotive applications.

Key Statistics & Figures

Throughput improvement

45x higher

Compared to CPU for certain applications

Inference performance on V100

50x faster

For ONNX models imported with ONNX parser in TensorRT

Inference speedup for FP16 custom layers

3x

When using APIs for Volta Tensor Cores

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Tensorrt

Used for accelerating inference applications

Format

Onnx

Facilitates model import from various deep learning frameworks

Hardware

Nvidia Drive Xavier

AI computer for autonomous vehicles supported by TensorRT 4

Key Actionable Insights

1
Leverage TensorRT 4 to significantly improve the performance of your AI applications.
With performance boosts of up to 45x compared to CPU, TensorRT 4 is ideal for developers looking to optimize their models for real-time inference.

2
Utilize the ONNX format for seamless model import from various deep learning frameworks.
This flexibility allows developers to transition their models easily into TensorRT 4, enhancing productivity and reducing time spent on model conversion.

3
Explore the new APIs for running FP16 custom layers on Volta Tensor Cores.
This can lead to a 3x inference speedup, making it crucial for developers aiming for high-performance AI solutions.