NVIDIA logo

How NVIDIA Uses Self-Attention

5 engineering articles about Self-Attention from NVIDIA's engineering team

Articles

Filter:
NVIDIA logo
NVIDIA
Advanced
This article discusses the emulation of the attention mechanism in transformer models using a fully convolutional network, specifically targeting improvements in computer vision tasks.
NVIDIA logo
NVIDIA
Advanced
This article discusses inference optimization techniques for large language models (LLMs), highlighting the challenges and solutions associated with memory and compute efficiency.
Shashank Verma
24 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the intricacies of training Large Language Models (LLMs) using transformer networks, focusing on model architectures, attention mechanisms, and embedding techniques.
NVIDIA logo
NVIDIA
Intermediate
The article discusses the structured sparsity feature in the NVIDIA Ampere architecture, particularly focusing on its implementation in deep learning and applications in search engines.
Hongxiao Bai
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the optimizations NVIDIA has made to the BERT model using TensorRT, enabling real-time natural language understanding with significantly reduced latency.
Purnendu Mukherjee
19 min read
Includes Code
Has Summary
--

You've reached the end! All 5 articles loaded.