How NVIDIA Uses Multi-Head Attention

3 engineering articles about Multi-Head Attention from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Articles

Filter:

NVIDIA

Intermediate

Achieve CUTLASS C++ Performance with Python APIs Using CuTe DSL

The article discusses how CuTe DSL, a new Python API for CUTLASS 4, simplifies GPU kernel development by reducing compilation times and maintaining performance efficiency similar to CUTLASS C++.

Multi-Head AttentionPythonPyTorch

Brandon Sun

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerate Custom Video Foundation Model Pipelines with New NVIDIA NeMo Framework Capabilities

The article discusses the new capabilities of the NVIDIA NeMo framework for accelerating custom video foundation model pipelines.

Generative AIMulti-Head AttentionTransformer

Zeeshan Patel

8 min read

Has Summary

NVIDIA

Advanced

Breaking MLPerf Training Records with NVIDIA H100 GPUs

The article discusses how NVIDIA's H100 Tensor Core GPUs achieved record-breaking performance in the MLPerf Training v3.

BERTEmbeddingGPTJSONMulti-Head AttentionPyTorchResNetTransformerU-Net

Ashraf Eassa

14 min read

Has Summary

You've reached the end! All 3 articles loaded.