Transformer Programming Tutorials &amp; Engineering Articles

How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI’s

Advanced

The article discusses how NVIDIA's hardware-software co-design significantly enhanced the inference performance of Sarvam AI's Sovereign 30B model, achieving a 4x speedup on NVIDIA Blackwell archit...

Hugging FacePyTorchTransformer

Utkarsh Uppal

14 min read

Has Summary

3 Ways NVFP4 Accelerates AI Training and Inference

Advanced

The article discusses how NVFP4, a low-precision floating-point format developed by NVIDIA, enhances AI training and inference performance.

Ads Candidate Generation using Behavioral Sequence Modeling

Ashraf Eassa

6 min read

Has Summary

Intermediate

The article discusses how Pinterest enhances its ad candidate generation process using behavioral sequence modeling.

Machine LearningSpringTransformer

Pinterest Engineering

10 min read

Has Summary

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

Intermediate

This article introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core designed to optimize training for variable-length sequences in large-scale models.

TransformerV

Kunlun Li

11 min read

Includes Code

Has Summary

Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

Advanced

The article discusses the limitations of current large language models (LLMs) in handling long contexts and introduces Test-Time Training with an end-to-end formulation (TTT-E2E) as a solution.

Neural NetworksRecurrent Neural NetworksTransformerTransformers

Yu Sun

6 min read

Has Summary

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

Advanced

The article discusses the NVIDIA Rubin platform, which introduces six new chips designed to create a powerful AI supercomputer.

AssemblyHugging FaceJAXKubernetesLessPyTorchRLHFTransformer

Kyle Aubrey

59 min read

Has Summary

How to Build a Voice Agent with RAG and Safety Guardrails

Advanced

This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.

EmbeddingHugging FacePythonTransformerTransformers

Chris Alexiuk

8 min read

Includes Code

Has Summary

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It Efficient and Accurate

Intermediate

The article discusses the NVIDIA Nemotron 3, a family of open models designed for agentic AI systems, emphasizing its efficiency and accuracy through innovative architectures and techniques.

Hugging FaceLarge Language ModelsReinforcement LearningTransformer

Chris Alexiuk

9 min read

Has Summary

OpenAI

Advanced

Inside Mirakl’s agentic commerce vision

The article discusses Mirakl's vision for agentic commerce, emphasizing the integration of AI across the company to enhance workflows and product offerings.

OpenAI Team

4 min read

Has Summary

Model Quantization: Concepts, Methods, and Why It Matters

Advanced

The article discusses model quantization, a technique essential for deploying complex AI models on resource-constrained hardware.

Ruixiang Wang

11 min read

Has Summary

NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks

Intermediate

The NVIDIA Blackwell architecture has achieved the fastest training times across all MLPerf Training v5. 1 benchmarks, showcasing significant advancements in AI training performance.

BERTDeep LearningLarge Language ModelsStable DiffusionTransformerV

Ashraf Eassa

10 min read

Has Summary

Democratizing Large-Scale Mixture-of-Experts Training with NVIDIA PyTorch Paralism

Advanced

The article discusses how NVIDIA's NeMo Automodel simplifies the training of large-scale mixture-of-experts (MoE) models in PyTorch, making it accessible to a broader audience.

GPTHugging FacePyTorchTransformer

Hemil Desai

7 min read

Includes Code

Has Summary

Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes

Advanced

The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...

Hugging FacePyTorchTransformerTransformers

Kyle Tretina

6 min read

Includes Code

Has Summary

Advanced

A Decade of AI Platform at Pinterest

The article reflects on a decade of AI platform development at Pinterest, detailing the evolution from fragmented machine learning stacks to a unified AI platform that supports various models.

AutoMLDockerEmbeddingGenerative AIJavaKubernetesLightGBMPySparkPythonPyTorchSeedSQLTensorFlowThriftTransformer

Pinterest Engineering

22 min read

Has Summary

Introducing the CodonFM Open Model for RNA Design and Analysis

Advanced

The article introduces CodonFM, a new state-of-the-art RNA foundation model developed by NVIDIA as part of the Clara open model family.

BERTFine-tuningHugging FaceTransformer

Kyle Gion

10 min read

Includes Code

Has Summary

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer

Advanced

The article discusses the optimization of large language models (LLMs) through pruning and knowledge distillation using NVIDIA TensorRT Model Optimizer.

EmbeddingHugging FaceTransformer

Max Xu

10 min read

Includes Code

Has Summary

How id Software Used Neural Rendering and Path Tracing in DOOM: The Dark Ages

Advanced

The article discusses how id Software integrated RTX neural rendering and path tracing into DOOM: The Dark Ages, highlighting the advancements in real-time graphics and the technical challenges ove...

Phillip Singh

6 min read

Has Summary

Gemma explained: EmbeddingGemma Architecture and Recipe

Intermediate

The article provides an in-depth exploration of the EmbeddingGemma architecture, detailing its origins, embedding generation process, and the comprehensive training methodology.

EmbeddingFine-tuningGeminiHugging FaceTransformerTransformersVertex AI

Henrique Schechter Vera, Juyeong Ji, Sahil Dua

7 min read

Includes Code

Has Summary

R²D²: Three Neural Breakthroughs Transforming Robot Learning from NVIDIA Research

Intermediate

The article discusses three neural innovations from NVIDIA Research that are enhancing robot learning capabilities, specifically focusing on bridging the gap between controlled simulations and real...

AssemblyFine-tuningGPTTransformerWarp

Rishabh Chadha

8 min read

Has Summary

On-device GenAI in Chrome, Chromebook Plus, and Pixel Watch with LiteRT-LM

Advanced

The article discusses the deployment of on-device generative AI (GenAI) using LiteRT-LM in Chrome, Chromebook Plus, and Pixel Watch.

ChiGeminiKotlinSwiftTransformer

Yu-hui Chen, Ram Iyengar

9 min read

Includes Code

Has Summary

Faster Training Throughput in FP8 Precision with NVIDIA NeMo

Advanced

This article discusses the advantages of using FP8 precision for faster training throughput in large-scale deep learning models with NVIDIA NeMo.

Karin Sevegnani

11 min read

Has Summary

Reasoning Through Molecular Synthetic Pathways with Generative AI

Advanced

The article discusses ReaSyn, a generative model developed by NVIDIA to predict molecular synthesis pathways, addressing the challenges of synthesizability in molecular design.

Generative AITransformer

Seul Lee

6 min read

Has Summary

NVIDIA HGX B200 Reduces Embodied Carbon Emissions Intensity

Beginner

The article discusses how the NVIDIA HGX B200 significantly reduces embodied carbon emissions intensity compared to its predecessor, the HGX H100, while enhancing performance and energy efficiency.

Zoe Kessler

4 min read

Has Summary

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

Advanced

The article introduces speculative decoding as a technique to reduce latency in AI inference, particularly for large language models (LLMs).

Hugging FaceTransformer

Jamie Li

10 min read

Includes Code

Has Summary

New Open Source Qwen3-Next Models Preview Hybrid MoE Architecture Delivering Improved Accuracy and

Intermediate

The article discusses the release of two new open-source models, Qwen3-Next 80B-A3B-Thinking and Qwen3-Next 80B-A3B-Instruct, which utilize a hybrid Mixture of Experts (MoE) architecture to enhance...

Hugging FaceLessTransformer

Anu Srivastava

4 min read

Includes Code

Has Summary

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

Advanced

The article discusses fine-tuning the gpt-oss model for improved accuracy and performance through Quantization Aware Training (QAT) and Supervised Fine-Tuning (SFT).

GPTHugging FacePyTorchTransformerTransformers

Eduardo Alvarez

7 min read

Includes Code

Has Summary

NVFP4 Trains with Precision of 16-Bit and Speed and Efficiency of 4-Bit

Intermediate

The article discusses NVIDIA's NVFP4, a new 4-bit precision format for training large language models (LLMs) that enhances efficiency and scalability while maintaining accuracy.

Google CloudMistralTransformer

Kirthi Devleker

9 min read

Has Summary

Introducing NVIDIA Jetson Thor, the Ultimate Platform for Physical AI

Advanced

The article introduces the NVIDIA Jetson Thor, a powerful platform designed for physical AI and humanoid robotics.

GeminiHugging FacePyTorchTransformer

Shashank Maheshwari

13 min read

Has Summary

Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era

Advanced

The article discusses the NVIDIA Blackwell Ultra GPU, a significant advancement in the Blackwell architecture designed to enhance AI training and reasoning capabilities.

TransformerWarp

Kyle Aubrey

13 min read

Has Summary

NVIDIA Hardware Innovations and Open Source Contributions Are Shaping AI

Advanced

The article discusses how NVIDIA's hardware innovations, particularly the Blackwell architecture and NVFP4 precision, along with their open source contributions, are driving advancements in AI.

GPTHugging FaceJAXKubernetesPythonPyTorchTransformer

George Chellapa

8 min read

Has Summary

Uber

Advanced

Forecasting Models to Improve Driver Availability at Airports

This article discusses the development and implementation of forecasting models aimed at improving driver availability at airports, which are critical to Uber's ridesharing ecosystem.

ApacheApache SparkCassandraKongTransformerTransformers

Bob Zheng, Dhruv Ghulati, Manoj Panikkar, Michael (Yichuan) Cai

15 min read

Has Summary

How Hackers Exploit AI’s Problem-Solving Instincts

Intermediate

The article discusses the evolving landscape of AI security, focusing on how hackers exploit the problem-solving instincts of multimodal AI systems through cognitive challenges.

GeminiTransformer

Daniel Teixeira

9 min read

Includes Code

Has Summary

NVIDIA Accelerates OpenAI gpt-oss Models Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72

Intermediate

NVIDIA has optimized OpenAI's gpt-oss models for accelerated inference performance on the NVIDIA GB200 NVL72 system, achieving up to 1. 5 million tokens per second (TPS).

DockerHugging FaceOllamaPythonTransformerTransformers

Anu Srivastava

6 min read

Includes Code

Has Summary

OpenAI

Advanced

Introducing gpt-oss

The article introduces gpt-oss, two state-of-the-art open-weight language models, gpt-oss-120b and gpt-oss-20b, which excel in reasoning tasks and are optimized for deployment on consumer hardware.

ApacheAWSAzureEmbeddingGPTHugging FaceOllamaPyTorchRustTransformerVercelWhisper

OpenAI

15 min read

Has Summary

Enhancing Multilingual Human-Like Speech and Voice Cloning with NVIDIA Riva TTS

Intermediate

The article discusses the advancements in multilingual human-like speech synthesis and voice cloning using NVIDIA Riva TTS.

DockerTransformer

Maggie Zhang

9 min read

Has Summary

T5Gemma: A new collection of encoder-decoder Gemma models

Intermediate

The article introduces T5Gemma, a new collection of encoder-decoder models derived from pretrained decoder-only models.

Hugging FaceT5TransformerVertex AI

Biao Zhang, Paul Suganthan, Ben Hora

5 min read

Has Summary

Optimizing FLUX.1 Kontext for Image Editing with Low-Precision Quantization

Advanced

The article discusses the optimization of the FLUX. 1 Kontext model for image editing through low-precision quantization techniques.

CLIPT5Transformer

Sandro Cavallari

9 min read

Includes Code

Has Summary

Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training

Advanced

This article discusses FP8 scaling strategies, including per-tensor and per-block scaling, essential for maintaining numerical stability and accuracy during low-precision training.

Karin Sevegnani

9 min read

Includes Code

Has Summary

Introducing Gemma 3n: The developer guide

Intermediate

The article introduces Gemma 3n, a mobile-first architecture designed for on-device AI, highlighting its multimodal capabilities and architectural innovations.

DockerGeminiGPTHugging FaceOllamaTransformerTransformersVertex AI

Omar Sanseviero, Ian Ballantyne

9 min read

Includes Code

Has Summary

R²D²: Building AI-based 3D Robot Perception and Mapping with NVIDIA Research

Intermediate

The article discusses advancements in AI-based 3D robot perception and mapping, focusing on NVIDIA's research efforts to create a unified 3D perception stack.

GRUPythonPyTorchTransformer

Raffaello Bonghi

12 min read

Has Summary

Accelerated Molecular Modeling with NVIDIA cuEquivariance and NVIDIA NIM microservices

Advanced

The article discusses NVIDIA's advancements in molecular AI modeling through the introduction of cuEquivariance and NIM microservices, which enhance the speed and efficiency of training and inferen...

ApachePyTorchTransformerTransformers

Neha Tadimeti

8 min read

Has Summary

Advancing Agentic AI with NVIDIA Nemotron Open Reasoning Models

Advanced

The article discusses the advancements in AI autonomy through NVIDIA's Nemotron open reasoning models, which enhance AI agents' decision-making capabilities in complex environments.

Hugging FaceMistralReinforcement LearningTransformer

Nirmal Kumar Juluru

6 min read

Has Summary

Intermediate

Next-Level Personalization: How 16k+ Lifelong User Actions Supercharge Pinterest’s Recommendations

This article discusses how Pinterest enhances its recommendation system through the TransActV2 model, which leverages over 16,000 lifelong user actions to improve personalization.

Machine LearningTransformer

Pinterest Engineering

8 min read

Has Summary

Introducing the Nemotron-H Reasoning Model Family: Throughput Gains Without Compromise

Advanced

The article introduces the Nemotron-H Reasoning Model Family developed by NVIDIA, which addresses the challenges of reasoning-intensive tasks in large language models by significantly improving thr...

KongTransformer

Adi Renduchintala

7 min read

Includes Code

Has Summary

NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0

Advanced

The article discusses the performance improvements delivered by NVIDIA's Blackwell architecture in MLPerf Training v5. 0, showcasing up to 2.

BERTNatural Language ProcessingStable DiffusionTransformer

Sukru Burc Eryilmaz

12 min read

Has Summary

Floating-Point 8: An Introduction to Efficient, Lower-Precision AI Training

Intermediate

The article discusses the advancements in AI training through the introduction of floating-point 8 (FP8) precision, emphasizing its benefits in computational efficiency and memory usage.

Karin Sevegnani

10 min read

Has Summary

Scaling to Millions of Tokens with Efficient Long-Context LLM Training

Advanced

The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.

TransformerTransformersV

Amit Bleiweiss

7 min read

Has Summary