How NVIDIA Uses Transformer
194 engineering articles about Transformer from NVIDIA's engineering team
Other NVIDIA Technologies
Other Companies Using Transformer
Articles
Filter:
The article discusses how NVIDIA's hardware-software co-design significantly enhanced the inference performance of Sarvam AI's Sovereign 30B model, achieving a 4x speedup on NVIDIA Blackwell archit...
Utkarsh Uppal
14 min read
Has Summary
--
The article discusses how NVFP4, a low-precision floating-point format developed by NVIDIA, enhances AI training and inference performance.
Ashraf Eassa
6 min read
Has Summary
--
This article introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core designed to optimize training for variable-length sequences in large-scale models.
Kunlun Li
11 min read
Includes Code
Has Summary
--
The article discusses the limitations of current large language models (LLMs) in handling long contexts and introduces Test-Time Training with an end-to-end formulation (TTT-E2E) as a solution.
Yu Sun
6 min read
Has Summary
--
The article discusses the NVIDIA Rubin platform, which introduces six new chips designed to create a powerful AI supercomputer.
Kyle Aubrey
59 min read
Has Summary
--
This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.
Chris Alexiuk
8 min read
Includes Code
Has Summary
--
The article discusses the NVIDIA Nemotron 3, a family of open models designed for agentic AI systems, emphasizing its efficiency and accuracy through innovative architectures and techniques.
Chris Alexiuk
9 min read
Has Summary
--
The article discusses model quantization, a technique essential for deploying complex AI models on resource-constrained hardware.
Ruixiang Wang
11 min read
Has Summary
--
The NVIDIA Blackwell architecture has achieved the fastest training times across all MLPerf Training v5. 1 benchmarks, showcasing significant advancements in AI training performance.
Ashraf Eassa
10 min read
Has Summary
--
The article discusses how NVIDIA's NeMo Automodel simplifies the training of large-scale mixture-of-experts (MoE) models in PyTorch, making it accessible to a broader audience.
Hemil Desai
7 min read
Includes Code
Has Summary
--
The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...
Kyle Tretina
6 min read
Includes Code
Has Summary
--
The article introduces CodonFM, a new state-of-the-art RNA foundation model developed by NVIDIA as part of the Clara open model family.
Kyle Gion
10 min read
Includes Code
Has Summary
--
The article discusses the optimization of large language models (LLMs) through pruning and knowledge distillation using NVIDIA TensorRT Model Optimizer.
Max Xu
10 min read
Includes Code
Has Summary
--
The article discusses how id Software integrated RTX neural rendering and path tracing into DOOM: The Dark Ages, highlighting the advancements in real-time graphics and the technical challenges ove...
Phillip Singh
6 min read
Has Summary
--
The article discusses three neural innovations from NVIDIA Research that are enhancing robot learning capabilities, specifically focusing on bridging the gap between controlled simulations and real...
Rishabh Chadha
8 min read
Has Summary
--
This article discusses the advantages of using FP8 precision for faster training throughput in large-scale deep learning models with NVIDIA NeMo.
Karin Sevegnani
11 min read
Has Summary
--
The article discusses ReaSyn, a generative model developed by NVIDIA to predict molecular synthesis pathways, addressing the challenges of synthesizability in molecular design.
Seul Lee
6 min read
Has Summary
--
The article discusses how the NVIDIA HGX B200 significantly reduces embodied carbon emissions intensity compared to its predecessor, the HGX H100, while enhancing performance and energy efficiency.
Zoe Kessler
4 min read
Has Summary
--
The article introduces speculative decoding as a technique to reduce latency in AI inference, particularly for large language models (LLMs).
Jamie Li
10 min read
Includes Code
Has Summary
--
The article discusses the release of two new open-source models, Qwen3-Next 80B-A3B-Thinking and Qwen3-Next 80B-A3B-Instruct, which utilize a hybrid Mixture of Experts (MoE) architecture to enhance...
Anu Srivastava
4 min read
Includes Code
Has Summary
--
The article discusses fine-tuning the gpt-oss model for improved accuracy and performance through Quantization Aware Training (QAT) and Supervised Fine-Tuning (SFT).
Eduardo Alvarez
7 min read
Includes Code
Has Summary
--
The article discusses NVIDIA's NVFP4, a new 4-bit precision format for training large language models (LLMs) that enhances efficiency and scalability while maintaining accuracy.
Kirthi Devleker
9 min read
Has Summary
--
The article introduces the NVIDIA Jetson Thor, a powerful platform designed for physical AI and humanoid robotics.
Shashank Maheshwari
13 min read
Has Summary
--
The article discusses the NVIDIA Blackwell Ultra GPU, a significant advancement in the Blackwell architecture designed to enhance AI training and reasoning capabilities.
Kyle Aubrey
13 min read
Has Summary
--
The article discusses how NVIDIA's hardware innovations, particularly the Blackwell architecture and NVFP4 precision, along with their open source contributions, are driving advancements in AI.
George Chellapa
8 min read
Has Summary
--
The article discusses the evolving landscape of AI security, focusing on how hackers exploit the problem-solving instincts of multimodal AI systems through cognitive challenges.
Daniel Teixeira
9 min read
Includes Code
Has Summary
--
NVIDIA has optimized OpenAI's gpt-oss models for accelerated inference performance on the NVIDIA GB200 NVL72 system, achieving up to 1. 5 million tokens per second (TPS).
Anu Srivastava
6 min read
Includes Code
Has Summary
--
The article discusses the advancements in multilingual human-like speech synthesis and voice cloning using NVIDIA Riva TTS.
Maggie Zhang
9 min read
Has Summary
--
The article discusses the optimization of the FLUX. 1 Kontext model for image editing through low-precision quantization techniques.
Sandro Cavallari
9 min read
Includes Code
Has Summary
--
This article discusses FP8 scaling strategies, including per-tensor and per-block scaling, essential for maintaining numerical stability and accuracy during low-precision training.
Karin Sevegnani
9 min read
Includes Code
Has Summary
--
The article discusses advancements in AI-based 3D robot perception and mapping, focusing on NVIDIA's research efforts to create a unified 3D perception stack.
Raffaello Bonghi
12 min read
Has Summary
--
The article discusses NVIDIA's advancements in molecular AI modeling through the introduction of cuEquivariance and NIM microservices, which enhance the speed and efficiency of training and inferen...
Neha Tadimeti
8 min read
Has Summary
--
The article discusses the advancements in AI autonomy through NVIDIA's Nemotron open reasoning models, which enhance AI agents' decision-making capabilities in complex environments.
Nirmal Kumar Juluru
6 min read
Has Summary
--
The article introduces the Nemotron-H Reasoning Model Family developed by NVIDIA, which addresses the challenges of reasoning-intensive tasks in large language models by significantly improving thr...
Adi Renduchintala
7 min read
Includes Code
Has Summary
--
The article discusses the performance improvements delivered by NVIDIA's Blackwell architecture in MLPerf Training v5. 0, showcasing up to 2.
Sukru Burc Eryilmaz
12 min read
Has Summary
--
The article discusses the advancements in AI training through the introduction of floating-point 8 (FP8) precision, emphasizing its benefits in computational efficiency and memory usage.
Karin Sevegnani
10 min read
Has Summary
--
The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.
Amit Bleiweiss
7 min read
Has Summary
--
This article discusses advanced optimization strategies for training large language models (LLMs) on the NVIDIA Grace Hopper Superchip.
Karin Sevegnani
9 min read
Includes Code
Has Summary
--
Researchers at the University of California, San Diego have identified the gene PHGDH as a direct cause of Alzheimer's disease using AI, which could lead to new treatment options.
Elias Wolfberg
3 min read
Has Summary
--
The article discusses the advancements brought by NVIDIA's TensorRT in enabling FP4 image generation for the Blackwell GeForce RTX 50 Series GPUs.
Gunjan Mehta
10 min read
Has Summary
--
The article discusses the introduction of the AutoModel feature in the NVIDIA NeMo Framework, which allows users to run Hugging Face models with Day-0 support.
Shashank Verma
5 min read
Includes Code
Has Summary
--
The article discusses optimizing transformer-based diffusion models for video generation using NVIDIA TensorRT, highlighting significant reductions in latency and total cost of ownership (TCO) achi...
Maximilian Müller
7 min read
Has Summary
--
NVIDIA has introduced the Llama 4 Scout and Llama 4 Maverick models, which leverage NVIDIA's open-source software to achieve impressive performance metrics on Blackwell B200 GPUs.
Anu Srivastava
4 min read
Has Summary
--
The article discusses the advancements of NVIDIA's Blackwell architecture, highlighting its significant performance improvements in MLPerf Inference v5.
Ashraf Eassa
9 min read
Has Summary
--
The article introduces NVIDIA Isaac for Healthcare, an AI-powered platform designed to advance medical robotics through simulation and real-time deployment.
Mostafa Toloui
9 min read
Has Summary
--
NVIDIA has announced world-record inference performance for the DeepSeek-R1 model using the Blackwell architecture, achieving over 250 tokens per second per user and a maximum throughput of over 30...
Ashraf Eassa
13 min read
Has Summary
--
The article discusses the NVIDIA Isaac GR00T N1, an open foundation model designed to accelerate the development of general-purpose humanoid robots.
Kalyan Meher Vadrevu
7 min read
Has Summary
--
The article discusses how NVIDIA Cosmos World Foundation Models (WFMs) enhance the development of AI-driven robots and autonomous vehicles by providing high-fidelity, physics-aware synthetic data.
Pranjali Joshi
7 min read
Includes Code
Has Summary
--
The article discusses the importance of measuring and improving AI workload performance using NVIDIA DGX Cloud Benchmarking.
Emily Potyraj
7 min read
Has Summary
--
The article discusses the challenges of training AI models on large GPU clusters, emphasizing the need for automation to ensure high GPU utilization and productivity.
Shelby Thomas
8 min read
Has Summary
--