How NVIDIA Uses Transformer

194 engineering articles about Transformer from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Other Companies Using Transformer

Articles

Filter:

NVIDIA

Advanced

How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI’s

The article discusses how NVIDIA's hardware-software co-design significantly enhanced the inference performance of Sarvam AI's Sovereign 30B model, achieving a 4x speedup on NVIDIA Blackwell archit...

Hugging FacePyTorchTransformer

Utkarsh Uppal

14 min read

Has Summary

NVIDIA

Advanced

3 Ways NVFP4 Accelerates AI Training and Inference

The article discusses how NVFP4, a low-precision floating-point format developed by NVIDIA, enhances AI training and inference performance.

Transformer

Ashraf Eassa

6 min read

Has Summary

NVIDIA

Intermediate

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

This article introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core designed to optimize training for variable-length sequences in large-scale models.

TransformerV

Kunlun Li

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

The article discusses the limitations of current large language models (LLMs) in handling long contexts and introduces Test-Time Training with an end-to-end formulation (TTT-E2E) as a solution.

Neural NetworksRecurrent Neural NetworksTransformerTransformers

Yu Sun

6 min read

Has Summary

NVIDIA

Advanced

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

The article discusses the NVIDIA Rubin platform, which introduces six new chips designed to create a powerful AI supercomputer.

AssemblyHugging FaceJAXKubernetesLessPyTorchRLHFTransformer

Kyle Aubrey

59 min read

Has Summary

NVIDIA

Advanced

How to Build a Voice Agent with RAG and Safety Guardrails

This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.

EmbeddingHugging FacePythonTransformerTransformers

Chris Alexiuk

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It Efficient and Accurate

The article discusses the NVIDIA Nemotron 3, a family of open models designed for agentic AI systems, emphasizing its efficiency and accuracy through innovative architectures and techniques.

Hugging FaceLarge Language ModelsReinforcement LearningTransformer

Chris Alexiuk

9 min read

Has Summary

NVIDIA

Advanced

Model Quantization: Concepts, Methods, and Why It Matters

The article discusses model quantization, a technique essential for deploying complex AI models on resource-constrained hardware.

Transformer

Ruixiang Wang

11 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks

The NVIDIA Blackwell architecture has achieved the fastest training times across all MLPerf Training v5. 1 benchmarks, showcasing significant advancements in AI training performance.

BERTDeep LearningLarge Language ModelsStable DiffusionTransformerV

Ashraf Eassa

10 min read

Has Summary

NVIDIA

Advanced

Democratizing Large-Scale Mixture-of-Experts Training with NVIDIA PyTorch Paralism

The article discusses how NVIDIA's NeMo Automodel simplifies the training of large-scale mixture-of-experts (MoE) models in PyTorch, making it accessible to a broader audience.

GPTHugging FacePyTorchTransformer

Hemil Desai

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes

The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...

Hugging FacePyTorchTransformerTransformers

Kyle Tretina

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Introducing the CodonFM Open Model for RNA Design and Analysis

The article introduces CodonFM, a new state-of-the-art RNA foundation model developed by NVIDIA as part of the Clara open model family.

BERTFine-tuningHugging FaceTransformer

Kyle Gion

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer

The article discusses the optimization of large language models (LLMs) through pruning and knowledge distillation using NVIDIA TensorRT Model Optimizer.

EmbeddingHugging FaceTransformer

Max Xu

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

How id Software Used Neural Rendering and Path Tracing in DOOM: The Dark Ages

The article discusses how id Software integrated RTX neural rendering and path tracing into DOOM: The Dark Ages, highlighting the advancements in real-time graphics and the technical challenges ove...

Transformer

Phillip Singh

6 min read

Has Summary

NVIDIA

Intermediate

R²D²: Three Neural Breakthroughs Transforming Robot Learning from NVIDIA Research

The article discusses three neural innovations from NVIDIA Research that are enhancing robot learning capabilities, specifically focusing on bridging the gap between controlled simulations and real...

AssemblyFine-tuningGPTTransformerWarp

Rishabh Chadha

8 min read

Has Summary

NVIDIA

Advanced

Faster Training Throughput in FP8 Precision with NVIDIA NeMo

This article discusses the advantages of using FP8 precision for faster training throughput in large-scale deep learning models with NVIDIA NeMo.

Transformer

Karin Sevegnani

11 min read

Has Summary

NVIDIA

Advanced

Reasoning Through Molecular Synthetic Pathways with Generative AI

The article discusses ReaSyn, a generative model developed by NVIDIA to predict molecular synthesis pathways, addressing the challenges of synthesizability in molecular design.

Generative AITransformer

Seul Lee

6 min read

Has Summary

NVIDIA

Beginner

NVIDIA HGX B200 Reduces Embodied Carbon Emissions Intensity

The article discusses how the NVIDIA HGX B200 significantly reduces embodied carbon emissions intensity compared to its predecessor, the HGX H100, while enhancing performance and energy efficiency.

Transformer

Zoe Kessler

4 min read

Has Summary

NVIDIA

Advanced

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

The article introduces speculative decoding as a technique to reduce latency in AI inference, particularly for large language models (LLMs).

Hugging FaceTransformer

Jamie Li

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

New Open Source Qwen3-Next Models Preview Hybrid MoE Architecture Delivering Improved Accuracy and

The article discusses the release of two new open-source models, Qwen3-Next 80B-A3B-Thinking and Qwen3-Next 80B-A3B-Instruct, which utilize a hybrid Mixture of Experts (MoE) architecture to enhance...

Hugging FaceLessTransformer

Anu Srivastava

4 min read

Includes Code

Has Summary

NVIDIA

Advanced

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

The article discusses fine-tuning the gpt-oss model for improved accuracy and performance through Quantization Aware Training (QAT) and Supervised Fine-Tuning (SFT).

GPTHugging FacePyTorchTransformerTransformers

Eduardo Alvarez

7 min read

Includes Code

Has Summary

NVIDIA

Intermediate

NVFP4 Trains with Precision of 16-Bit and Speed and Efficiency of 4-Bit

The article discusses NVIDIA's NVFP4, a new 4-bit precision format for training large language models (LLMs) that enhances efficiency and scalability while maintaining accuracy.

Google CloudMistralTransformer

Kirthi Devleker

9 min read

Has Summary

NVIDIA

Advanced

Introducing NVIDIA Jetson Thor, the Ultimate Platform for Physical AI

The article introduces the NVIDIA Jetson Thor, a powerful platform designed for physical AI and humanoid robotics.

GeminiHugging FacePyTorchTransformer

Shashank Maheshwari

13 min read

Has Summary

NVIDIA

Advanced

Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era

The article discusses the NVIDIA Blackwell Ultra GPU, a significant advancement in the Blackwell architecture designed to enhance AI training and reasoning capabilities.

TransformerWarp

Kyle Aubrey

13 min read

Has Summary

NVIDIA

Advanced

NVIDIA Hardware Innovations and Open Source Contributions Are Shaping AI

The article discusses how NVIDIA's hardware innovations, particularly the Blackwell architecture and NVFP4 precision, along with their open source contributions, are driving advancements in AI.

GPTHugging FaceJAXKubernetesPythonPyTorchTransformer

George Chellapa

8 min read

Has Summary

NVIDIA

Intermediate

How Hackers Exploit AI’s Problem-Solving Instincts

The article discusses the evolving landscape of AI security, focusing on how hackers exploit the problem-solving instincts of multimodal AI systems through cognitive challenges.

GeminiTransformer

Daniel Teixeira

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

NVIDIA Accelerates OpenAI gpt-oss Models Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72

NVIDIA has optimized OpenAI's gpt-oss models for accelerated inference performance on the NVIDIA GB200 NVL72 system, achieving up to 1. 5 million tokens per second (TPS).

DockerHugging FaceOllamaPythonTransformerTransformers

Anu Srivastava

6 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Enhancing Multilingual Human-Like Speech and Voice Cloning with NVIDIA Riva TTS

The article discusses the advancements in multilingual human-like speech synthesis and voice cloning using NVIDIA Riva TTS.

DockerTransformer

Maggie Zhang

9 min read

Has Summary

NVIDIA

Advanced

Optimizing FLUX.1 Kontext for Image Editing with Low-Precision Quantization

The article discusses the optimization of the FLUX. 1 Kontext model for image editing through low-precision quantization techniques.

CLIPT5Transformer

Sandro Cavallari

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training

This article discusses FP8 scaling strategies, including per-tensor and per-block scaling, essential for maintaining numerical stability and accuracy during low-precision training.

Transformer

Karin Sevegnani

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

R²D²: Building AI-based 3D Robot Perception and Mapping with NVIDIA Research

The article discusses advancements in AI-based 3D robot perception and mapping, focusing on NVIDIA's research efforts to create a unified 3D perception stack.

GRUPythonPyTorchTransformer

Raffaello Bonghi

12 min read

Has Summary

NVIDIA

Advanced

Accelerated Molecular Modeling with NVIDIA cuEquivariance and NVIDIA NIM microservices

The article discusses NVIDIA's advancements in molecular AI modeling through the introduction of cuEquivariance and NIM microservices, which enhance the speed and efficiency of training and inferen...

ApachePyTorchTransformerTransformers

Neha Tadimeti

8 min read

Has Summary

NVIDIA

Advanced

Advancing Agentic AI with NVIDIA Nemotron Open Reasoning Models

The article discusses the advancements in AI autonomy through NVIDIA's Nemotron open reasoning models, which enhance AI agents' decision-making capabilities in complex environments.

Hugging FaceMistralReinforcement LearningTransformer

Nirmal Kumar Juluru

6 min read

Has Summary

NVIDIA

Advanced

Introducing the Nemotron-H Reasoning Model Family: Throughput Gains Without Compromise

The article introduces the Nemotron-H Reasoning Model Family developed by NVIDIA, which addresses the challenges of reasoning-intensive tasks in large language models by significantly improving thr...

KongTransformer

Adi Renduchintala

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0

The article discusses the performance improvements delivered by NVIDIA's Blackwell architecture in MLPerf Training v5. 0, showcasing up to 2.

BERTNatural Language ProcessingStable DiffusionTransformer

Sukru Burc Eryilmaz

12 min read

Has Summary

NVIDIA

Intermediate

Floating-Point 8: An Introduction to Efficient, Lower-Precision AI Training

The article discusses the advancements in AI training through the introduction of floating-point 8 (FP8) precision, emphasizing its benefits in computational efficiency and memory usage.

Transformer

Karin Sevegnani

10 min read

Has Summary

NVIDIA

Advanced

Scaling to Millions of Tokens with Efficient Long-Context LLM Training

The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.

TransformerTransformersV

Amit Bleiweiss

7 min read

Has Summary

NVIDIA

Advanced

Advanced Optimization Strategies for LLM Training on NVIDIA Grace Hopper

This article discusses advanced optimization strategies for training large language models (LLMs) on the NVIDIA Grace Hopper Superchip.

PythonPyTorchTransformer

Karin Sevegnani

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

AI Helps Uncover Potential Alzheimer’s Cause and Treatment

Researchers at the University of California, San Diego have identified the gene PHGDH as a direct cause of Alzheimer's disease using AI, which could lead to new treatment options.

Transformer

Elias Wolfberg

3 min read

Has Summary

NVIDIA

Intermediate

NVIDIA TensorRT Unlocks FP4 Image Generation for NVIDIA Blackwell GeForce RTX 50 Series GPUs

The article discusses the advancements brought by NVIDIA's TensorRT in enabling FP4 image generation for the Blackwell GeForce RTX 50 Series GPUs.

CLIPPyTorchT5Transformer

Gunjan Mehta

10 min read

Has Summary

NVIDIA

Intermediate

Run Hugging Face Models Instantly with Day-0 Support from NVIDIA NeMo Framework

The article discusses the introduction of the AutoModel feature in the NVIDIA NeMo Framework, which allows users to run Hugging Face models with Day-0 support.

Fine-tuningHugging FaceMistralPyTorchTransformer

Shashank Verma

5 min read

Includes Code

Has Summary

NVIDIA

Advanced

Optimizing Transformer-Based Diffusion Models for Video Generation with NVIDIA TensorRT

The article discusses optimizing transformer-based diffusion models for video generation using NVIDIA TensorRT, highlighting significant reductions in latency and total cost of ownership (TCO) achi...

AWSDeep LearningDiffusion ModelsPyTorchTensorFlowTransformer

Maximilian Müller

7 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Accelerates Inference on Meta Llama 4 Scout and Maverick

NVIDIA has introduced the Llama 4 Scout and Llama 4 Maverick models, which leverage NVIDIA's open-source software to achieve impressive performance metrics on Blackwell B200 GPUs.

Fine-tuningTransformer

Anu Srivastava

4 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Blackwell Delivers Massive Performance Leaps in MLPerf Inference v5.0

The article discusses the advancements of NVIDIA's Blackwell architecture, highlighting its significant performance improvements in MLPerf Inference v5.

GPTKongResNetStable DiffusionTransformerU-Net

Ashraf Eassa

9 min read

Has Summary

NVIDIA

Intermediate

Introducing NVIDIA Isaac for Healthcare, an AI-Powered Medical Robotics Development Platform

The article introduces NVIDIA Isaac for Healthcare, an AI-powered platform designed to advance medical robotics through simulation and real-time deployment.

NeptuneTransformer

Mostafa Toloui

9 min read

Has Summary

NVIDIA

Advanced

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA has announced world-record inference performance for the DeepSeek-R1 model using the Blackwell architecture, achieving over 250 tokens per second per user and a maximum throughput of over 30...

CLIPHugging FaceJAXOllamaPythonPyTorchT5TensorFlowTransformer

Ashraf Eassa

13 min read

Has Summary

NVIDIA

Intermediate

Accelerate Generalist Humanoid Robot Development with NVIDIA Isaac GR00T N1

The article discusses the NVIDIA Isaac GR00T N1, an open foundation model designed to accelerate the development of general-purpose humanoid robots.

Hugging FacePyTorchTransformer

Kalyan Meher Vadrevu

7 min read

Has Summary

NVIDIA

Intermediate

Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models

The article discusses how NVIDIA Cosmos World Foundation Models (WFMs) enhance the development of AI-driven robots and autonomous vehicles by providing high-fidelity, physics-aware synthetic data.

Hugging FaceJSONReinforcement LearningTransformer

Pranjali Joshi

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

Measure and Improve AI Workload Performance with NVIDIA DGX Cloud Benchmarking

The article discusses the importance of measuring and improving AI workload performance using NVIDIA DGX Cloud Benchmarking.

AWSAzureGoogle CloudOracleTransformer

Emily Potyraj

7 min read

Has Summary

NVIDIA

Intermediate

Ensuring Reliable Model Training on NVIDIA DGX Cloud

The article discusses the challenges of training AI models on large GPU clusters, emphasizing the need for automation to ensure high GPU utilization and productivity.

PyTorchTransformer

Shelby Thomas

8 min read

Has Summary