Hello World! Robot Responds to Human Gestures

By: Madeleine Waldie, Abhinav Ayalur, Jackson Moffet, and Nikhil Suresh This summer a team of four high school interns, the Neural Ninjas…

Nefi Alarcon
4 min readbeginner
--
View Original

Overview

A team of high school interns developed a gesture recognition neural network to enable a robot to respond to human gestures, specifically recognizing a wave. They utilized technologies like NVIDIA Tesla V100 GPUs, TensorFlow, and PyTorch to create and implement their models on a humanoid robot using the Robot Operating System (ROS).

What You'll Learn

1

How to develop a gesture recognition neural network using Python and C++

2

Why multithreading can enhance the performance of pose estimation systems

3

How to implement a Long Short-Term Memory Neural Network (LSTM) for gesture detection

Prerequisites & Requirements

  • Basic understanding of deep learning concepts
  • Familiarity with TensorFlow and PyTorch(optional)
  • Experience with Python and C++ programming

Key Questions Answered

How did the interns train their gesture recognition neural network?
The interns trained their neural network, called Post OpenPose Neural Network (POPNN), using NVIDIA Tesla V100 GPUs and the cuDNN-accelerated TensorFlow framework. They also built a Long Short-Term Memory Neural Network (LSTM) using PyTorch to improve gesture detection capabilities.
What techniques did the team use to speed up pose estimation?
To enhance the inference speed of TF-Pose-Estimation, the team multithreaded the processes into four separate threads: video, display, pose estimation, and optical flow. This approach increased the camera's frames per second (FPS) by ten times.
What gestures can the robot recognize and respond to?
The robot can detect various gestures including waves, x-poses, y-poses, and dabs. It responds to these gestures by mimicking them and engaging in simple conversation with the user.
How did the interns collect data for training their neural networks?
The interns collected data by capturing videos of people waving and not waving, which allowed them to generate data files without needing to stand in front of a camera for extended periods. They used the Jetson TX2 to run inference on TF-Pose-Estimation and save body part positions.

Key Statistics & Figures

Camera FPS improvement
10 times faster
This improvement was achieved by multithreading the processes involved in pose estimation.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Hardware
Nvidia Tesla V100
Used for training the neural networks.
Software
Tensorflow
Framework used to train the Post OpenPose Neural Network (POPNN).
Software
Pytorch
Framework used to build the Long Short-Term Memory Neural Network (LSTM).
Hardware
Jetson Tx2
Device used to run inference on TF-Pose-Estimation and collect data.
Software
Robot Operating System (ros)
Used to send data from the Jetson TX2 to the robot.

Key Actionable Insights

1
To improve the performance of your gesture recognition systems, consider implementing multithreading to handle different processes concurrently.
This approach can significantly enhance the speed of pose estimation, as demonstrated by the interns who achieved a tenfold increase in camera FPS.
2
Utilize existing frameworks like TensorFlow and PyTorch to streamline the development of neural networks for gesture recognition.
These frameworks provide robust tools and libraries that can simplify the training and implementation of complex models such as LSTMs.
3
Incorporate user interaction into robotic systems by programming responses to specific gestures.
This not only makes the robot more engaging but also enhances user experience, as seen when the robot responds to waves and questions.

Common Pitfalls

1
Failing to properly collect and preprocess data can lead to ineffective training of neural networks.
The interns overcame this by capturing videos for data generation, which saved time and ensured they had a diverse dataset for training.

Related Concepts

Gesture Recognition
Neural Networks
Pose Estimation
Deep Learning Frameworks