Building a Real-time Redaction App Using NVIDIA DeepStream, Part 2: Deployment

This post is the second in a series (Part 1) that addresses the challenges of training an accurate deep learning model using a large public dataset and…

Chintan Shah
12 min readadvanced
--
View Original

Overview

This article is the second part of a series on building a real-time redaction application using NVIDIA DeepStream. It focuses on deploying a trained RetinaNet model on an NVIDIA Jetson AGX Xavier device to redact faces in real-time across multiple video streams.

What You'll Learn

1

How to deploy a trained ONNX model on an NVIDIA Jetson device using DeepStream SDK

2

Why using TensorRT is essential for low-latency inference in real-time applications

3

How to build a custom bounding box parser for RetinaNet in DeepStream

4

When to use a tracker to optimize inference performance in video analytics

Prerequisites & Requirements

  • NVIDIA Jetson AGX Xavier device or any NVIDIA GPU
  • DeepStream SDK for real-time video analytics

Key Questions Answered

How do you deploy a trained ONNX model using DeepStream SDK?
To deploy a trained ONNX model using DeepStream SDK, you need to set up the hardware, build the TensorRT engine from the ONNX model, and configure the DeepStream application to use the engine for inference. This involves modifying configuration files to point to the model and setting up the bounding box parser.
What are the performance metrics for the RetinaNet model on Jetson AGX Xavier?
The performance metrics for the RetinaNet model on Jetson AGX Xavier include a frame rate of 38 FPS for 1 stream at FP16 precision, and 30 FPS for 4 streams at FP16 precision. Using INT8 precision allows for up to 33 FPS for 6 streams.
What modifications are needed to create a DeepStream redaction app?
To create a DeepStream redaction app, modifications include removing detection text display and implementing a callback to add a solid color rectangle over detected faces. This ensures that the detected faces are redacted effectively in the output video.
When should you use a tracker in a video analytics application?
You should use a tracker in a video analytics application when you need to optimize performance by reducing the frequency of inference operations. By inferring every other or every third frame, you can manage GPU load while maintaining accurate tracking of detected objects.

Key Statistics & Figures

Frame rate for 1 stream at FP16 precision
38 FPS
This applies when using the RetinaNet model for object detection and redaction.
Frame rate for 4 streams at FP16 precision
30 FPS
This is the performance observed when processing multiple streams simultaneously.
Frame rate for 6 streams at INT8 precision
33 FPS
This shows the performance improvement achieved by using INT8 precision.
Frame rate for 8 streams at INT8 precision
26 FPS
This indicates the maximum performance when scaling up to 8 streams.

Technologies & Tools

Software
Nvidia Deepstream
Used for real-time video analytics and deploying the redaction application.
Software
Tensorrt
Utilized for optimizing the ONNX model for low-latency inference.
Model
Retinanet
The deep learning model used for object detection in the redaction application.
Hardware
Nvidia Jetson Agx Xavier
The edge device used for deploying the application.

Key Actionable Insights

1
Optimize your video analytics application by implementing a tracker to reduce the computational load on the GPU.
Using a tracker allows you to skip inference on certain frames, which can significantly improve performance, especially when processing multiple video streams.
2
Experiment with different batch sizes when deploying your model to find the optimal performance for your specific hardware.
The batch size can greatly affect inference speed and resource utilization, so testing various configurations can lead to improved application responsiveness.
3
Utilize the DeepStream SDK's built-in features to streamline the development of your video analytics pipeline.
DeepStream provides various tools and plugins that can help you efficiently build and deploy applications, reducing the time and effort needed for custom implementations.

Common Pitfalls

1
Failing to optimize the batch size for the specific hardware can lead to suboptimal performance.
Different models and hardware configurations can handle varying batch sizes, so it's important to test and adjust the batch size to achieve the best inference speed.
2
Neglecting to configure the DeepStream application correctly can result in errors during deployment.
It's crucial to ensure that all paths in the configuration files are accurate and that the necessary libraries are referenced correctly to avoid runtime issues.

Related Concepts

Real-time Video Analytics
Deep Learning Model Deployment
Nvidia Jetson Hardware
Tensorrt Optimization Techniques