NVIDIA logo

How NVIDIA Uses Transformer

194 engineering articles about Transformer from NVIDIA's engineering team

Articles

Filter:
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA's hardware-software co-design significantly enhanced the inference performance of Sarvam AI's Sovereign 30B model, achieving a 4x speedup on NVIDIA Blackwell archit...
Utkarsh Uppal
14 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVFP4, a low-precision floating-point format developed by NVIDIA, enhances AI training and inference performance.
Ashraf Eassa
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core designed to optimize training for variable-length sequences in large-scale models.
Kunlun Li
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the limitations of current large language models (LLMs) in handling long contexts and introduces Test-Time Training with an end-to-end formulation (TTT-E2E) as a solution.
NVIDIA logo
NVIDIA
Advanced
The article discusses the NVIDIA Rubin platform, which introduces six new chips designed to create a powerful AI supercomputer.
NVIDIA logo
NVIDIA
Advanced
This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.
Chris Alexiuk
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the NVIDIA Nemotron 3, a family of open models designed for agentic AI systems, emphasizing its efficiency and accuracy through innovative architectures and techniques.
NVIDIA logo
NVIDIA
Advanced
The article discusses model quantization, a technique essential for deploying complex AI models on resource-constrained hardware.
Ruixiang Wang
11 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The NVIDIA Blackwell architecture has achieved the fastest training times across all MLPerf Training v5. 1 benchmarks, showcasing significant advancements in AI training performance.
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA's NeMo Automodel simplifies the training of large-scale mixture-of-experts (MoE) models in PyTorch, making it accessible to a broader audience.
Hemil Desai
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...
Kyle Tretina
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article introduces CodonFM, a new state-of-the-art RNA foundation model developed by NVIDIA as part of the Clara open model family.
Kyle Gion
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the optimization of large language models (LLMs) through pruning and knowledge distillation using NVIDIA TensorRT Model Optimizer.
Max Xu
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how id Software integrated RTX neural rendering and path tracing into DOOM: The Dark Ages, highlighting the advancements in real-time graphics and the technical challenges ove...
Phillip Singh
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses three neural innovations from NVIDIA Research that are enhancing robot learning capabilities, specifically focusing on bridging the gap between controlled simulations and real...
Rishabh Chadha
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses the advantages of using FP8 precision for faster training throughput in large-scale deep learning models with NVIDIA NeMo.
Karin Sevegnani
11 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses ReaSyn, a generative model developed by NVIDIA to predict molecular synthesis pathways, addressing the challenges of synthesizability in molecular design.
Seul Lee
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Beginner
The article discusses how the NVIDIA HGX B200 significantly reduces embodied carbon emissions intensity compared to its predecessor, the HGX H100, while enhancing performance and energy efficiency.
Zoe Kessler
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article introduces speculative decoding as a technique to reduce latency in AI inference, particularly for large language models (LLMs).
Jamie Li
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the release of two new open-source models, Qwen3-Next 80B-A3B-Thinking and Qwen3-Next 80B-A3B-Instruct, which utilize a hybrid Mixture of Experts (MoE) architecture to enhance...
Anu Srivastava
4 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses fine-tuning the gpt-oss model for improved accuracy and performance through Quantization Aware Training (QAT) and Supervised Fine-Tuning (SFT).
Eduardo Alvarez
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses NVIDIA's NVFP4, a new 4-bit precision format for training large language models (LLMs) that enhances efficiency and scalability while maintaining accuracy.
Kirthi Devleker
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article introduces the NVIDIA Jetson Thor, a powerful platform designed for physical AI and humanoid robotics.
Shashank Maheshwari
13 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the NVIDIA Blackwell Ultra GPU, a significant advancement in the Blackwell architecture designed to enhance AI training and reasoning capabilities.
Kyle Aubrey
13 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA's hardware innovations, particularly the Blackwell architecture and NVFP4 precision, along with their open source contributions, are driving advancements in AI.
George Chellapa
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the evolving landscape of AI security, focusing on how hackers exploit the problem-solving instincts of multimodal AI systems through cognitive challenges.
Daniel Teixeira
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
NVIDIA has optimized OpenAI's gpt-oss models for accelerated inference performance on the NVIDIA GB200 NVL72 system, achieving up to 1. 5 million tokens per second (TPS).
Anu Srivastava
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the advancements in multilingual human-like speech synthesis and voice cloning using NVIDIA Riva TTS.
Maggie Zhang
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the optimization of the FLUX. 1 Kontext model for image editing through low-precision quantization techniques.
Sandro Cavallari
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses FP8 scaling strategies, including per-tensor and per-block scaling, essential for maintaining numerical stability and accuracy during low-precision training.
Karin Sevegnani
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses advancements in AI-based 3D robot perception and mapping, focusing on NVIDIA's research efforts to create a unified 3D perception stack.
Raffaello Bonghi
12 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses NVIDIA's advancements in molecular AI modeling through the introduction of cuEquivariance and NIM microservices, which enhance the speed and efficiency of training and inferen...
Neha Tadimeti
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the advancements in AI autonomy through NVIDIA's Nemotron open reasoning models, which enhance AI agents' decision-making capabilities in complex environments.
Nirmal Kumar Juluru
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article introduces the Nemotron-H Reasoning Model Family developed by NVIDIA, which addresses the challenges of reasoning-intensive tasks in large language models by significantly improving thr...
Adi Renduchintala
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the performance improvements delivered by NVIDIA's Blackwell architecture in MLPerf Training v5. 0, showcasing up to 2.
Sukru Burc Eryilmaz
12 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the advancements in AI training through the introduction of floating-point 8 (FP8) precision, emphasizing its benefits in computational efficiency and memory usage.
Karin Sevegnani
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.
Amit Bleiweiss
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses advanced optimization strategies for training large language models (LLMs) on the NVIDIA Grace Hopper Superchip.
Karin Sevegnani
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
Researchers at the University of California, San Diego have identified the gene PHGDH as a direct cause of Alzheimer's disease using AI, which could lead to new treatment options.
Elias Wolfberg
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the advancements brought by NVIDIA's TensorRT in enabling FP4 image generation for the Blackwell GeForce RTX 50 Series GPUs.
Gunjan Mehta
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the introduction of the AutoModel feature in the NVIDIA NeMo Framework, which allows users to run Hugging Face models with Day-0 support.
Shashank Verma
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses optimizing transformer-based diffusion models for video generation using NVIDIA TensorRT, highlighting significant reductions in latency and total cost of ownership (TCO) achi...
Maximilian Müller
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
NVIDIA has introduced the Llama 4 Scout and Llama 4 Maverick models, which leverage NVIDIA's open-source software to achieve impressive performance metrics on Blackwell B200 GPUs.
Anu Srivastava
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the advancements of NVIDIA's Blackwell architecture, highlighting its significant performance improvements in MLPerf Inference v5.
Ashraf Eassa
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article introduces NVIDIA Isaac for Healthcare, an AI-powered platform designed to advance medical robotics through simulation and real-time deployment.
Mostafa Toloui
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA has announced world-record inference performance for the DeepSeek-R1 model using the Blackwell architecture, achieving over 250 tokens per second per user and a maximum throughput of over 30...
NVIDIA logo
NVIDIA
Intermediate
The article discusses the NVIDIA Isaac GR00T N1, an open foundation model designed to accelerate the development of general-purpose humanoid robots.
Kalyan Meher Vadrevu
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how NVIDIA Cosmos World Foundation Models (WFMs) enhance the development of AI-driven robots and autonomous vehicles by providing high-fidelity, physics-aware synthetic data.
Pranjali Joshi
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the importance of measuring and improving AI workload performance using NVIDIA DGX Cloud Benchmarking.
Emily Potyraj
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the challenges of training AI models on large GPU clusters, emphasizing the need for automation to ensure high GPU utilization and productivity.
Shelby Thomas
8 min read
Has Summary
--