How NVIDIA Uses Self-Attention

5 engineering articles about Self-Attention from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Articles

Filter:

NVIDIA

Advanced

Emulating the Attention Mechanism in Transformer Models with a Fully Convolutional Network

This article discusses the emulation of the attention mechanism in transformer models using a fully convolutional network, specifically targeting improvements in computer vision tasks.

Attention MechanismResNetSelf-AttentionTransformerTransformersV

John Yang

12 min read

Has Summary

NVIDIA

Advanced

Mastering LLM Techniques: Inference Optimization

This article discusses inference optimization techniques for large language models (LLMs), highlighting the challenges and solutions associated with memory and compute efficiency.

Autoregressive ModelsBERTGPTSelf-AttentionTransformerV

Shashank Verma

24 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Mastering LLM Techniques: Training

The article discusses the intricacies of training Large Language Models (LLMs) using transformer networks, focusing on model architectures, attention mechanisms, and embedding techniques.

Attention MechanismBERTEmbeddingGPTLarge Language ModelsNeural NetworksRecurrent Neural NetworksSelf-AttentionTransformerTransformersV

Anjali Shah

14 min read

Has Summary

NVIDIA

Intermediate

Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines

The article discusses the structured sparsity feature in the NVIDIA Ampere architecture, particularly focusing on its implementation in deep learning and applications in search engines.

BERTMachine LearningNeural NetworksPythonSelf-AttentionTransformers

Hongxiao Bai

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Real-Time Natural Language Understanding with BERT Using TensorRT

The article discusses the optimizations NVIDIA has made to the BERT model using TensorRT, enabling real-time natural language understanding with significantly reduced latency.

BERTDockerGoogle CloudGPTPythonRoBERTaSelf-AttentionTransformerTransformersV

Purnendu Mukherjee

19 min read

Includes Code

Has Summary

You've reached the end! All 5 articles loaded.