How NVIDIA Uses V

130 engineering articles about V from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Other Companies Using V

Oxide Computer Company(2)

Articles

Filter:

NVIDIA

Advanced

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

The article discusses NVIDIA TensorRT LLM AutoDeploy, a beta feature that automates the inference optimization process for large language models (LLMs).

Hugging FacePyTorchTransformersV

Lucas Liebenwein

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

This article introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core designed to optimize training for variable-length sequences in large-scale models.

TransformerV

Kunlun Li

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

AI Factories, Physical AI, and Advances in Models, Agents, and Infrastructure That Shaped 2025

The article discusses the advancements in AI technologies and infrastructure that shaped the year 2025, focusing on NVIDIA's innovations in AI factories, physical AI, and model optimization.

RenderVWarp

Michelle Horton

3 min read

Has Summary

NVIDIA

Advanced

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT-LLM

The article discusses the Skip Softmax technique, a method for accelerating long-context inference in large language models (LLMs) using NVIDIA TensorRT-LLM.

PythonVYAML

Laikh Tewari

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache

The article discusses NVFP4 KV cache quantization, a new key-value format that significantly enhances inference performance on NVIDIA Blackwell GPUs.

Eduardo Alvarez

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

How to Get Started with Neural Shading for Your Game or Application

The article discusses neural shading as a transformative approach to real-time rendering, integrating trainable models into graphics pipelines to enhance visual fidelity and performance.

PythonRenderV

Shannon Woods

20 min read

Includes Code

Has Summary

NVIDIA

Intermediate

NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks

The NVIDIA Blackwell architecture has achieved the fastest training times across all MLPerf Training v5. 1 benchmarks, showcasing significant advancements in AI training performance.

BERTDeep LearningLarge Language ModelsStable DiffusionTransformerV

Ashraf Eassa

10 min read

Has Summary

NVIDIA

Intermediate

Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS

The article discusses the enhancements in cuBLAS with the introduction of floating-point emulation for Tensor Core performance, particularly focusing on double-precision (FP64) matrix multiplicatio...

Cole Brower

10 min read

Has Summary

NVIDIA

Advanced

Build High-Performance Vision AI Pipelines with NVIDIA CUDA-Accelerated VC-6

This article discusses the optimization of vision AI workloads using NVIDIA's CUDA-accelerated implementation of SMPTE VC-6, a codec designed for efficient interaction with modern compute architect...

PythonPyTorchV

Andreas Kieslinger

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Optimizing LLMs for Performance and Accuracy with Post-Training Quantization

The article discusses the optimization of large language models (LLMs) through post-training quantization (PTQ), emphasizing its benefits in enhancing inference performance while maintaining accura...

Hugging FacePyTorchV

Eduardo Alvarez

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels

The article discusses CUTLASS, a library developed by NVIDIA for handling multidimensional data through tensors and spatial microkernels. It highlights the advancements in CUTLASS 3.

PythonV

Cris Cecka

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Streamlining GPU Porting for EDF’s Fluid Dynamics Simulations with NVIDIA Nsight Profilers

The article discusses the process of porting CPU applications to NVIDIA GPUs to enhance performance, particularly in the context of Électricité de France's (EDF) fluid dynamics simulations using th...

AWSFortranPythonV

Florent Duguet

5 min read

Includes Code

Has Summary

NVIDIA

Advanced

Scaling to Millions of Tokens with Efficient Long-Context LLM Training

The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.

TransformerTransformersV

Amit Bleiweiss

7 min read

Has Summary

NVIDIA

Advanced

Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick

NVIDIA has set a new world record for large language model inference speed, achieving over 1,000 tokens per second per user with the 400-billion-parameter Llama 4 Maverick model on a single NVIDIA ...

Yilin Fan

8 min read

Has Summary

NVIDIA

Advanced

NVIDIA 800 VDC Architecture Will Power the Next Generation of AI Factories

NVIDIA is pioneering the shift to 800 VDC architecture to meet the growing power demands of AI factories, moving beyond traditional 54 V systems.

Mathias Blake

7 min read

Has Summary

NVIDIA

Intermediate

Announcing NVIDIA Secure AI General Availability

NVIDIA has announced the general availability of Secure AI, focusing on protecting data and code during AI training and inference, particularly for large language models (LLMs).

AzureRapidsV

Emily Sakata

3 min read

Has Summary

NVIDIA

Advanced

Configurable Graph-Based Task Solving with the Marco Multi-AI Agent Framework for Chip Design

The article discusses the Marco framework, a configurable graph-based task-solving and multi-AI agent system designed to streamline chip design processes.

Large Language ModelsTransformerV

Mark Ren

8 min read

Has Summary

NVIDIA

Intermediate

Using NetworkX, Jaccard Similarity, and cuGraph to Predict Your Next Favorite Movie

This article discusses how to build a movie recommendation system using NetworkX, Jaccard Similarity, and NVIDIA cuGraph to enhance performance.

NetworkXPythonV

Rick Ratzel

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Time-Lapse AI Model Enhances IVF Embryo Selection

Researchers from Weill Cornell Medicine have developed the Blastocyst Evaluation Learning Algorithm (BELA), an AI-powered model that enhances embryo selection in in vitro fertilization (IVF) by eva...

Michelle Horton

3 min read

Has Summary

NVIDIA

Intermediate

Developing a 172B LLM with Strong Japanese Capabilities Using NVIDIA Megatron-LM

The article discusses the development of a 172 billion parameter large language model (LLM) with strong Japanese capabilities using NVIDIA Megatron-LM.

Generative AIGoogle CloudGPTHugging FacePaLMTransformerV

Kazuki Fujii

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Spotlight: Accelerating HPC in Energy with AWS Energy HPC Orchestrator and NVIDIA Energy Samples

The article discusses the integration of the AWS Energy HPC Orchestrator with NVIDIA Energy Samples to enhance high-performance computing (HPC) in the energy sector.

AWSDeep LearningDockerJSONPythonV

Jihyun Yang

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Enabling Customizable GPU-Accelerated Video Transcoding Pipelines

The article discusses the integration of NVIDIA's NVENC technology with V-Nova's MPEG-5 Part 2 Low-Complexity Enhancement Video Coding (LCEVC) standard to create customizable GPU-accelerated video ...

Ricardo Monteiro

9 min read

Has Summary

NVIDIA

Intermediate

Achieving State-of-the-Art Zero-Shot Waveform Audio Generation across Audio Types

The article discusses NVIDIA's advancements in audio generative AI with the introduction of BigVGAN v2, a universal neural vocoder that synthesizes audio waveforms with state-of-the-art quality and...

Deep LearningV

Sang-gil Lee

5 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules

NVIDIA has fully transitioned to open-source GPU kernel modules with the upcoming R560 driver release, enhancing support for various GPU architectures while providing substantial new capabilities.

AnsibleV

Rob Armstrong

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Advanced AI and Retrieval-Augmented Generation for Code Development in High-Performance Computing

The article discusses the integration of advanced AI and Retrieval-Augmented Generation (RAG) techniques in high-performance computing (HPC) code development.

CopilotEmbeddingV

Harry Petty

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Revolutionizing Graph Analytics: Next-Gen Architecture with NVIDIA cuGraph Acceleration

This article discusses the advancements in graph analytics through a next-generation architecture utilizing NVIDIA cuGraph acceleration.

AWSPythonThriftV

Manoj Kumar

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Announcing Confidential Computing General Access on NVIDIA H100 Tensor Core GPUs

NVIDIA has announced the general availability of its Confidential Computing solution on NVIDIA H100 Tensor Core GPUs, which provides enhanced security for data in use, particularly for AI applicati...

AzureRapidsV

Rob Nertney

3 min read

Has Summary

NVIDIA

Intermediate

Applying Mixture of Experts in LLM Architectures

The article discusses the application of Mixture of Experts (MoE) in large language model (LLM) architectures, highlighting its benefits in terms of model capacity, cost efficiency, and latency red...

GPTGPT-4MistralTransformerTransformersV

Kyle Kranen

11 min read

Has Summary

NVIDIA

Intermediate

Powerful Shader Insights: Using Shader Debug Info with NVIDIA Nsight Graphics

The article discusses the importance of using shader debugging information with NVIDIA Nsight Graphics for optimizing shader performance in ray tracing applications.

Louis Bavoil

6 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Calculating Video Quality Using NVIDIA GPUs and VMAF-CUDA

This article discusses the implementation of Video Multi-Method Assessment Fusion (VMAF) using NVIDIA GPUs and CUDA, highlighting the performance improvements and advantages of VMAF-CUDA over tradi...

DockerV

Cem Moluluo

13 min read

Includes Code

Has Summary

NVIDIA

Advanced

Emulating the Attention Mechanism in Transformer Models with a Fully Convolutional Network

This article discusses the emulation of the attention mechanism in transformer models using a fully convolutional network, specifically targeting improvements in computer vision tasks.

Attention MechanismResNetSelf-AttentionTransformerTransformersV

John Yang

12 min read

Has Summary

NVIDIA

Advanced

Mastering LLM Techniques: Inference Optimization

This article discusses inference optimization techniques for large language models (LLMs), highlighting the challenges and solutions associated with memory and compute efficiency.

Autoregressive ModelsBERTGPTSelf-AttentionTransformerV

Shashank Verma

24 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Mastering LLM Techniques: Training

The article discusses the intricacies of training Large Language Models (LLMs) using transformer networks, focusing on model architectures, attention mechanisms, and embedding techniques.

Attention MechanismBERTEmbeddingGPTLarge Language ModelsNeural NetworksRecurrent Neural NetworksSelf-AttentionTransformerTransformersV

Anjali Shah

14 min read

Has Summary

NVIDIA

Advanced

Differentiable Slang: A Shading Language for Renderers That Learn

NVIDIA's Differentiable Slang is a new shading language designed to unify real-time, inverse, and differentiable rendering, enabling seamless integration of machine learning with graphics programmi...

NumPyPythonPyTorchRemixTensorFlowV

Sai Bangaru

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Designing Deep Networks to Process Other Deep Networks

The article discusses the design of deep neural networks (DNNs) that can process the weights of other DNNs, focusing on architectures that leverage the symmetries of weight spaces.

Deep LearningGraph Neural NetworksNeural NetworksTransformerTransformersV

Haggai Maron

14 min read

Has Summary

NVIDIA

Intermediate

Flexible and Powerful Ray Tracing with NVIDIA OptiX 8

NVIDIA OptiX 8 is a powerful ray tracing framework that leverages GPU acceleration to create photorealistic visuals efficiently.

Zach Lo

4 min read

Has Summary

NVIDIA

Advanced

New Video Creation and Streaming Features Accelerated by the NVIDIA Video Codec SDK

The article discusses the new features introduced in NVIDIA's Video Codec SDK 12. 1, focusing on GPU-accelerated video processing through NVENC and NVDEC.

CDNV

Prathap Muthana

7 min read

Has Summary

NVIDIA

Intermediate

Microsoft and TempoQuest Accelerate Wind Energy Forecasts with AceCast

Microsoft and TempoQuest have collaborated to enhance wind energy forecasting using AceCAST, a GPU-accelerated version of the Weather Research and Forecasting (WRF) model.

AzureV

Gene Pache

6 min read

Has Summary

NVIDIA

Intermediate

A Comprehensive Guide to Interaction Terms in Linear Regression

This article provides a comprehensive guide to understanding interaction terms in linear regression, emphasizing their importance in modeling the relationship between dependent and independent vari...

Pythonscikit-learnV

Eryk Lewinson

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Increasing Inference Acceleration of KoGPT with NVIDIA FasterTransformer

The article discusses the optimization of Kakao Brain's KoGPT large language model using NVIDIA FasterTransformer, highlighting the significant improvements in inference speed and performance.

BERTGPTPyTorchT5TensorFlowTransformerTransformersV

Daemyung Jang

5 min read

Has Summary

NVIDIA

Advanced

AutoDMP Optimizes Macro Placement for Chip Design with AI and GPUs

The article discusses how AutoDMP leverages AI and GPU technology to optimize macro placement in chip design, significantly improving performance and efficiency.

PyTorchReinforcement LearningV

Anthony Agnesina

10 min read

Has Summary

NVIDIA

Intermediate

Develop AI-Powered Robots, Smart Vision Systems, and More with NVIDIA Jetson Orin Nano Developer Kit

The NVIDIA Jetson Orin Nano Developer Kit is designed for creating entry-level AI-powered robots, smart drones, and intelligent vision systems, offering up to 40 TOPS of AI performance.

BERTChatGPTComputer VisionDALL-EResNetTransformerV

Leela Subramaniam Karumbunathan

8 min read

Has Summary

NVIDIA

Intermediate

Ultra-Realism Made Accessible with NVIDIA AI and Path Tracing Technologies

The article discusses NVIDIA's advancements in AI and path tracing technologies, particularly showcased at GDC 2023.

Ike Nnoli

6 min read

Has Summary

NVIDIA

Intermediate

3D Content Interoperability with Topology-Free Modeling

The article discusses the challenges of traditional topology-based 3D modeling and introduces Shapeyard, a tool that automates topology generation for 3D content creation, enhancing interoperabilit...

AWSGenerative AIHTMLRenderTensorFlowV

Philipp Batura

9 min read

Has Summary

NVIDIA

Advanced

GPU-Accelerated Video Processing with NVIDIA In-Depth Support for Vulkan Video

The article discusses NVIDIA's support for Vulkan Video, a new API that enables developers to leverage GPU acceleration for video processing.

Neil Trevett

6 min read

Has Summary

NVIDIA

Advanced

NVIDIA Grace CPU Superchip Architecture In Depth

The NVIDIA Grace CPU Superchip represents a groundbreaking advancement in data center CPU architecture, combining Arm processors with NVIDIA's expertise to deliver high-performance computing capabi...

AWSV

Jonathon Evans

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Entropy-Based Methods for Word-Level ASR Confidence Estimation

This article discusses entropy-based methods for estimating confidence in word-level predictions from automatic speech recognition (ASR) models.

Aleksandr Laptev

11 min read

Has Summary

NVIDIA

Advanced

Best-in-Class Quantum Circuit Simulation at Scale with NVIDIA cuQuantum Appliance

The article discusses the capabilities of the NVIDIA cuQuantum Appliance for quantum circuit simulation at scale, highlighting its performance benchmarks on the ABCI 2. 0 supercomputer.

Tom Lubowe

8 min read

Has Summary

NVIDIA

Advanced

Boosting Dynamic Programming Performance Using NVIDIA Hopper GPU DPX Instructions

The article discusses how NVIDIA Hopper GPU DPX instructions can significantly enhance the performance of dynamic programming algorithms, particularly in genomic sequence alignment and robotic path...

Ajay Tirumala

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

X-ray Research Reveals Hazards in Airport Luggage Using Crystal Physics

The article discusses a study on X-ray diffraction (XRD) technology aimed at enhancing airport luggage scanning to identify hazardous materials.

CrystalV

Michelle Horton

5 min read

Has Summary