TensorRT 3: Faster TensorFlow Inference and Volta Support

Brad Nemire

NVIDIA TensorRT™ is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep learning…

NVIDIA

•

Brad Nemire

•1 min read•beginner•

--

•View Original

PythonTensorFlow

Overview

NVIDIA TensorRT™ is a deep learning inference optimizer that enhances performance for TensorFlow applications. The release of TensorRT 3 introduces a TensorFlow Model Importer, a Python API, and Volta Tensor Core Support, significantly improving inference speed on Tesla V100 GPUs.

What You'll Learn

1

How to import and optimize TensorFlow models using TensorRT

2

Why using a Python API can improve productivity in deep learning inference

3

When to leverage Volta Tensor Core Support for faster inference

Key Questions Answered

What are the key features introduced in TensorRT 3?

TensorRT 3 introduces several key features including a TensorFlow Model Importer for easy model optimization, a user-friendly Python API for improved productivity, and Volta Tensor Core Support which provides up to 3.7x faster inference performance on Tesla V100 GPUs compared to Tesla P100 GPUs.

How does TensorRT improve TensorFlow inference performance?

TensorRT improves TensorFlow inference performance by optimizing trained models and generating runtime engines that can be serialized for deployment. This allows for low latency and high-throughput inference, making it suitable for production applications.

What is the benefit of using the Python API in TensorRT?

The Python API in TensorRT provides an easy-to-use interface that enhances productivity for developers. It simplifies the process of importing, optimizing, and generating inference engines from TensorFlow models, making it accessible for users with varying levels of expertise.

What performance improvements can be expected with Volta Tensor Core Support?

With Volta Tensor Core Support, TensorRT can deliver up to 3.7x faster inference performance on Tesla V100 GPUs compared to Tesla P100 GPUs. This significant improvement is crucial for applications requiring high-speed inference.

Key Statistics & Figures

Inference performance improvement

up to 3.7x

This performance improvement is observed on Tesla V100 GPUs compared to Tesla P100 GPUs.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Inference Optimizer

Tensorrt

Used for optimizing and running deep learning inference applications.

Deep Learning Framework

Tensorflow

Models trained in TensorFlow can be imported and optimized using TensorRT.

Programming Language

Python

The Python API enhances usability and productivity for TensorRT users.

GPU

Tesla V100

Provides hardware acceleration for TensorRT's inference capabilities.

GPU

Tesla P100

Previous generation GPU compared to Tesla V100 in terms of performance.

Key Actionable Insights

1
Utilize the TensorFlow Model Importer to streamline your workflow.
This feature allows you to easily import and optimize your existing TensorFlow models, saving time and reducing complexity in the deployment process.

2
Leverage the Python API for rapid development and testing.
The Python API simplifies interactions with TensorRT, making it easier to prototype and iterate on deep learning models without getting bogged down in lower-level details.

3
Consider upgrading to Tesla V100 GPUs to maximize performance gains.
If your application demands high throughput and low latency, the performance improvements offered by Volta Tensor Core Support can significantly enhance your inference capabilities.