NVIDIA logo

How NVIDIA Uses V

130 engineering articles about V from NVIDIA's engineering team

Articles

Filter:
NVIDIA logo
NVIDIA
Advanced
The article discusses NVIDIA TensorRT LLM AutoDeploy, a beta feature that automates the inference optimization process for large language models (LLMs).
​​Lucas Liebenwein
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core designed to optimize training for variable-length sequences in large-scale models.
Kunlun Li
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the advancements in AI technologies and infrastructure that shaped the year 2025, focusing on NVIDIA's innovations in AI factories, physical AI, and model optimization.
Michelle Horton
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the Skip Softmax technique, a method for accelerating long-context inference in large language models (LLMs) using NVIDIA TensorRT-LLM.
Laikh Tewari
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses NVFP4 KV cache quantization, a new key-value format that significantly enhances inference performance on NVIDIA Blackwell GPUs.
Eduardo Alvarez
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses neural shading as a transformative approach to real-time rendering, integrating trainable models into graphics pipelines to enhance visual fidelity and performance.
Shannon Woods
20 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The NVIDIA Blackwell architecture has achieved the fastest training times across all MLPerf Training v5. 1 benchmarks, showcasing significant advancements in AI training performance.
NVIDIA logo
NVIDIA
Intermediate
The article discusses the enhancements in cuBLAS with the introduction of floating-point emulation for Tensor Core performance, particularly focusing on double-precision (FP64) matrix multiplicatio...
Cole Brower
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses the optimization of vision AI workloads using NVIDIA's CUDA-accelerated implementation of SMPTE VC-6, a codec designed for efficient interaction with modern compute architect...
Andreas Kieslinger
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the optimization of large language models (LLMs) through post-training quantization (PTQ), emphasizing its benefits in enhancing inference performance while maintaining accura...
Eduardo Alvarez
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses CUTLASS, a library developed by NVIDIA for handling multidimensional data through tensors and spatial microkernels. It highlights the advancements in CUTLASS 3.
Cris Cecka
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the process of porting CPU applications to NVIDIA GPUs to enhance performance, particularly in the context of Électricité de France's (EDF) fluid dynamics simulations using th...
Florent Duguet
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.
Amit Bleiweiss
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA has set a new world record for large language model inference speed, achieving over 1,000 tokens per second per user with the 400-billion-parameter Llama 4 Maverick model on a single NVIDIA ...
Yilin Fan
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA is pioneering the shift to 800 VDC architecture to meet the growing power demands of AI factories, moving beyond traditional 54 V systems.
Mathias Blake
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
NVIDIA has announced the general availability of Secure AI, focusing on protecting data and code during AI training and inference, particularly for large language models (LLMs).
Emily Sakata
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the Marco framework, a configurable graph-based task-solving and multi-AI agent system designed to streamline chip design processes.
Mark Ren
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses how to build a movie recommendation system using NetworkX, Jaccard Similarity, and NVIDIA cuGraph to enhance performance.
Rick Ratzel
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
Researchers from Weill Cornell Medicine have developed the Blastocyst Evaluation Learning Algorithm (BELA), an AI-powered model that enhances embryo selection in in vitro fertilization (IVF) by eva...
Michelle Horton
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the development of a 172 billion parameter large language model (LLM) with strong Japanese capabilities using NVIDIA Megatron-LM.
Kazuki Fujii
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the integration of the AWS Energy HPC Orchestrator with NVIDIA Energy Samples to enhance high-performance computing (HPC) in the energy sector.
Jihyun Yang
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the integration of NVIDIA's NVENC technology with V-Nova's MPEG-5 Part 2 Low-Complexity Enhancement Video Coding (LCEVC) standard to create customizable GPU-accelerated video ...
Ricardo Monteiro
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses NVIDIA's advancements in audio generative AI with the introduction of BigVGAN v2, a universal neural vocoder that synthesizes audio waveforms with state-of-the-art quality and...
Sang-gil Lee
5 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
NVIDIA has fully transitioned to open-source GPU kernel modules with the upcoming R560 driver release, enhancing support for various GPU architectures while providing substantial new capabilities.
Rob Armstrong
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the integration of advanced AI and Retrieval-Augmented Generation (RAG) techniques in high-performance computing (HPC) code development.
Harry Petty
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses the advancements in graph analytics through a next-generation architecture utilizing NVIDIA cuGraph acceleration.
Manoj Kumar
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA has announced the general availability of its Confidential Computing solution on NVIDIA H100 Tensor Core GPUs, which provides enhanced security for data in use, particularly for AI applicati...
Rob Nertney
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the application of Mixture of Experts (MoE) in large language model (LLM) architectures, highlighting its benefits in terms of model capacity, cost efficiency, and latency red...
Kyle Kranen
11 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the importance of using shader debugging information with NVIDIA Nsight Graphics for optimizing shader performance in ray tracing applications.
Louis Bavoil
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses the implementation of Video Multi-Method Assessment Fusion (VMAF) using NVIDIA GPUs and CUDA, highlighting the performance improvements and advantages of VMAF-CUDA over tradi...
Cem Moluluo
13 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses the emulation of the attention mechanism in transformer models using a fully convolutional network, specifically targeting improvements in computer vision tasks.
NVIDIA logo
NVIDIA
Advanced
This article discusses inference optimization techniques for large language models (LLMs), highlighting the challenges and solutions associated with memory and compute efficiency.
Shashank Verma
24 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the intricacies of training Large Language Models (LLMs) using transformer networks, focusing on model architectures, attention mechanisms, and embedding techniques.
NVIDIA logo
NVIDIA
Advanced
NVIDIA's Differentiable Slang is a new shading language designed to unify real-time, inverse, and differentiable rendering, enabling seamless integration of machine learning with graphics programmi...
Sai Bangaru
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the design of deep neural networks (DNNs) that can process the weights of other DNNs, focusing on architectures that leverage the symmetries of weight spaces.
NVIDIA logo
NVIDIA
Intermediate
NVIDIA OptiX 8 is a powerful ray tracing framework that leverages GPU acceleration to create photorealistic visuals efficiently.
Zach Lo
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the new features introduced in NVIDIA's Video Codec SDK 12. 1, focusing on GPU-accelerated video processing through NVENC and NVDEC.
Prathap Muthana
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
Microsoft and TempoQuest have collaborated to enhance wind energy forecasting using AceCAST, a GPU-accelerated version of the Weather Research and Forecasting (WRF) model.
Gene Pache
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article provides a comprehensive guide to understanding interaction terms in linear regression, emphasizing their importance in modeling the relationship between dependent and independent vari...
Eryk Lewinson
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the optimization of Kakao Brain's KoGPT large language model using NVIDIA FasterTransformer, highlighting the significant improvements in inference speed and performance.
Daemyung Jang
5 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how AutoDMP leverages AI and GPU technology to optimize macro placement in chip design, significantly improving performance and efficiency.
Anthony Agnesina
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The NVIDIA Jetson Orin Nano Developer Kit is designed for creating entry-level AI-powered robots, smart drones, and intelligent vision systems, offering up to 40 TOPS of AI performance.
Leela Subramaniam Karumbunathan
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses NVIDIA's advancements in AI and path tracing technologies, particularly showcased at GDC 2023.
Ike Nnoli
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the challenges of traditional topology-based 3D modeling and introduces Shapeyard, a tool that automates topology generation for 3D content creation, enhancing interoperabilit...
Philipp Batura
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses NVIDIA's support for Vulkan Video, a new API that enables developers to leverage GPU acceleration for video processing.
Neil Trevett
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The NVIDIA Grace CPU Superchip represents a groundbreaking advancement in data center CPU architecture, combining Arm processors with NVIDIA's expertise to deliver high-performance computing capabi...
Jonathon Evans
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses entropy-based methods for estimating confidence in word-level predictions from automatic speech recognition (ASR) models.
Aleksandr Laptev
11 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the capabilities of the NVIDIA cuQuantum Appliance for quantum circuit simulation at scale, highlighting its performance benchmarks on the ABCI 2. 0 supercomputer.
Tom Lubowe
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA Hopper GPU DPX instructions can significantly enhance the performance of dynamic programming algorithms, particularly in genomic sequence alignment and robotic path...
Ajay Tirumala
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses a study on X-ray diffraction (XRD) technology aimed at enhancing airport luggage scanning to identify hazardous materials.
Michelle Horton
5 min read
Has Summary
--