JetPack 3.1 Doubles Jetson’s Low-Latency Inference Performance

Dustin Franklin

Today, NVIDIA released JetPack 3.1, the production Linux software release for Jetson TX1 and TX2. With upgrades to TensorRT 2.1 and cuDNN 6.0, JetPack 3.1…

NVIDIA

•

Dustin Franklin

•6 min read•intermediate•

--

•View Original

GRULSTMNeural NetworksRecurrent Neural NetworksResNetYOLO

Overview

NVIDIA's JetPack 3.1 significantly enhances the low-latency inference performance of the Jetson TX1 and TX2 platforms, doubling the deep learning inference capabilities for real-time applications. With upgrades to TensorRT 2.1 and cuDNN 6.0, developers can leverage improved features for deploying intelligent autonomous machines.

What You'll Learn

1

How to utilize TensorRT 2.1 for optimized deep learning inference on Jetson

2

Why JetPack 3.1 is crucial for enhancing AI capabilities in edge computing

3

How to implement custom layers in TensorRT for advanced neural networks

Prerequisites & Requirements

Understanding of deep learning concepts and frameworks
Familiarity with NVIDIA Jetson platforms and JetPack software(optional)

Key Questions Answered

What improvements does JetPack 3.1 bring to Jetson TX1 and TX2?

JetPack 3.1 introduces significant upgrades including TensorRT 2.1 and cuDNN 6.0, which together provide up to a 2x increase in deep learning inference performance for real-time applications. This enhancement is particularly beneficial for applications like vision-guided navigation and motion control.

How does TensorRT 2.1 reduce latency for deep learning inference?

TensorRT 2.1 achieves reduced latency through network graph optimizations, kernel fusion, and support for half-precision FP16. It allows for batch size 1 processing, significantly lowering latency to 5ms for GoogLeNet, making it ideal for latency-sensitive applications.

What are the key features of the NVIDIA Isaac Initiative?

The NVIDIA Isaac Initiative is an end-to-end robotics platform designed to advance AI in robotics. It includes simulation tools, an autonomous navigation stack, and Jetson for deployment, facilitating the development of intelligent systems in various robotic applications.

What custom layer support does TensorRT 2.1 provide?

TensorRT 2.1 supports custom network layers through a user plugin API, allowing developers to implement advanced neural networks such as residual networks, Recurrent Neural Networks (RNNs), and Faster-RCNN. This flexibility enhances the deployment of sophisticated deep learning applications.

Key Statistics & Figures

Deep learning inference performance increase

up to 2x

This performance boost is achieved through the upgrades in JetPack 3.1, particularly with TensorRT 2.1.

Latency for GoogLeNet with TensorRT 2.1

5ms

This latency is achieved when processing with batch size 1 in the Max-P performance profile.

Latency for ResNet-50 with TensorRT 2.1

12.2ms

This measurement is for the Max-P performance profile, showcasing the efficiency improvements over previous versions.

Technologies & Tools

Software

Jetpack

Provides the production Linux software release for Jetson TX1 and TX2.

Software

Tensorrt

Optimizes deep learning inference performance on Jetson platforms.

Software

Cudnn

Enhances deep learning performance with optimized routines for neural networks.

Platform

Nvidia Isaac

An end-to-end platform for developing and deploying intelligent robotic systems.

Key Actionable Insights

1
Leverage TensorRT 2.1 to optimize your deep learning models for Jetson platforms, focusing on batch size 1 for real-time applications.
This approach is particularly useful for applications requiring immediate processing, such as autonomous navigation and collision avoidance, where latency is critical.

2
Explore the NVIDIA Isaac Initiative to accelerate your robotics projects with AI capabilities.
Utilizing the resources and tools provided by the Isaac Initiative can streamline the development process and enhance the functionality of robotic systems.

3
Implement custom layers in TensorRT to enhance the performance of your neural networks.
By creating user-defined plugins, you can tailor the inference process to meet specific application needs, improving efficiency and accuracy.

Common Pitfalls

1

Failing to optimize deep learning models for low-latency applications can lead to performance bottlenecks.

Developers should ensure they leverage features like batch size 1 processing in TensorRT to meet the demands of real-time applications.

2

Neglecting to explore custom layer capabilities in TensorRT may limit the performance of advanced neural networks.

By not utilizing the user plugin API, developers miss out on the potential to enhance their models with tailored optimizations.

Related Concepts

Deep Learning Optimization Techniques

Robotics And AI Integration

Advanced Neural Network Architectures