V Programming Tutorials & Engineering Articles

167 V tutorials, guides, and engineering insights from NVIDIA, LinkedIn, ClickHouse, and more

Companies Using This

NVIDIA(130)

Oxide Computer Company(2)

V Articles & Tutorials

Filter:

NVIDIA

Advanced

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

The article discusses NVIDIA TensorRT LLM AutoDeploy, a beta feature that automates the inference optimization process for large language models (LLMs).

Hugging FacePyTorchTransformersV

Lucas Liebenwein

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

This article introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core designed to optimize training for variable-length sequences in large-scale models.

TransformerV

Kunlun Li

11 min read

Includes Code

Has Summary

Advanced

PinLanding: Turn Billions of Products into Instant Shopping Collections with Multimodal AI

PinLanding is a multimodal AI pipeline developed by Pinterest to generate shopping collections from billions of products.

ApacheApache SparkCLIPFine-tuningGPTMachine LearningModalV

Pinterest Engineering

8 min read

Has Summary

NVIDIA

Advanced

AI Factories, Physical AI, and Advances in Models, Agents, and Infrastructure That Shaped 2025

The article discusses the advancements in AI technologies and infrastructure that shaped the year 2025, focusing on NVIDIA's innovations in AI factories, physical AI, and model optimization.

RenderVWarp

Michelle Horton

3 min read

Has Summary

NVIDIA

Advanced

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT-LLM

The article discusses the Skip Softmax technique, a method for accelerating long-context inference in large language models (LLMs) using NVIDIA TensorRT-LLM.

PythonVYAML

Laikh Tewari

6 min read

Includes Code

Has Summary

ClickHouse

Intermediate

Alexey's favorite features of 2025

The article highlights Alexey's favorite features introduced in ClickHouse throughout 2025, including lightweight updates, data lake support, and advancements in text and vector indexing.

ChiPostgreSQLSQLV

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache

The article discusses NVFP4 KV cache quantization, a new key-value format that significantly enhances inference performance on NVIDIA Blackwell GPUs.

Eduardo Alvarez

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

How to Get Started with Neural Shading for Your Game or Application

The article discusses neural shading as a transformative approach to real-time rendering, integrating trainable models into graphics pipelines to enhance visual fidelity and performance.

PythonRenderV

Shannon Woods

20 min read

Includes Code

Has Summary

NVIDIA

Intermediate

NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks

The NVIDIA Blackwell architecture has achieved the fastest training times across all MLPerf Training v5. 1 benchmarks, showcasing significant advancements in AI training performance.

BERTDeep LearningLarge Language ModelsStable DiffusionTransformerV

Ashraf Eassa

10 min read

Has Summary

ClickHouse

Intermediate

Improve logs compression with log clustering

This article discusses how to enhance log compression through log clustering techniques in ClickHouse, focusing on transforming unstructured logs into structured data for efficient storage.

PythonSQLVXML

Lionel Palacin

17 min read

Includes Code

Has Summary

ClickHouse

Intermediate

We built a vector search engine that lets you choose precision at query time

The article discusses the introduction of QBit, a new column type in ClickHouse that allows for flexible precision in vector search queries.

GeminiRocketV

Raufs Dunamalijevs

24 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS

The article discusses the enhancements in cuBLAS with the introduction of floating-point emulation for Tensor Core performance, particularly focusing on double-precision (FP64) matrix multiplicatio...

Cole Brower

10 min read

Has Summary

NVIDIA

Advanced

Build High-Performance Vision AI Pipelines with NVIDIA CUDA-Accelerated VC-6

This article discusses the optimization of vision AI workloads using NVIDIA's CUDA-accelerated implementation of SMPTE VC-6, a codec designed for efficient interaction with modern compute architect...

PythonPyTorchV

Andreas Kieslinger

12 min read

Includes Code

Has Summary

ClickHouse

Intermediate

ClickHouse Release 25.8

ClickHouse version 25. 8 introduces 45 new features, 47 performance optimizations, and 119 bug fixes, enhancing its capabilities as a high-performance analytical database.

ApacheApache ArrowAWSAWS S3AzureAzure Blob StorageChigRPCJSONPrometheusSQLV

ClickHouse Team

15 min read

Includes Code

Has Summary

NVIDIA

Advanced

Optimizing LLMs for Performance and Accuracy with Post-Training Quantization

The article discusses the optimization of large language models (LLMs) through post-training quantization (PTQ), emphasizing its benefits in enhancing inference performance while maintaining accura...

Hugging FacePyTorchV

Eduardo Alvarez

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels

The article discusses CUTLASS, a library developed by NVIDIA for handling multidimensional data through tensors and spatial microkernels. It highlights the advancements in CUTLASS 3.

PythonV

Cris Cecka

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Streamlining GPU Porting for EDF’s Fluid Dynamics Simulations with NVIDIA Nsight Profilers

The article discusses the process of porting CPU applications to NVIDIA GPUs to enhance performance, particularly in the context of Électricité de France's (EDF) fluid dynamics simulations using th...

AWSFortranPythonV

Florent Duguet

5 min read

Includes Code

Has Summary

NVIDIA

Advanced

Scaling to Millions of Tokens with Efficient Long-Context LLM Training

The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.

TransformerTransformersV

Amit Bleiweiss

7 min read

Has Summary

Intermediate

Modernizing Home Feed Pre-Ranking Stage

The article discusses the modernization of Pinterest's home feed pre-ranking stage, focusing on the introduction of a sophisticated pre-ranking layer known as Lightweight Scoring.

ChiMachine LearningV

Pinterest Engineering

8 min read

Has Summary

NVIDIA

Advanced

Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick

NVIDIA has set a new world record for large language model inference speed, achieving over 1,000 tokens per second per user with the 400-billion-parameter Llama 4 Maverick model on a single NVIDIA ...

Yilin Fan

8 min read

Has Summary

NVIDIA

Advanced

NVIDIA 800 VDC Architecture Will Power the Next Generation of AI Factories

NVIDIA is pioneering the shift to 800 VDC architecture to meet the growing power demands of AI factories, moving beyond traditional 54 V systems.

Mathias Blake

7 min read

Has Summary

NVIDIA

Intermediate

Announcing NVIDIA Secure AI General Availability

NVIDIA has announced the general availability of Secure AI, focusing on protecting data and code during AI training and inference, particularly for large language models (LLMs).

AzureRapidsV

Emily Sakata

3 min read

Has Summary

NVIDIA

Advanced

Configurable Graph-Based Task Solving with the Marco Multi-AI Agent Framework for Chip Design

The article discusses the Marco framework, a configurable graph-based task-solving and multi-AI agent system designed to streamline chip design processes.

Large Language ModelsTransformerV

Mark Ren

8 min read

Has Summary

NVIDIA

Intermediate

Using NetworkX, Jaccard Similarity, and cuGraph to Predict Your Next Favorite Movie

This article discusses how to build a movie recommendation system using NetworkX, Jaccard Similarity, and NVIDIA cuGraph to enhance performance.

NetworkXPythonV

Rick Ratzel

9 min read

Includes Code

Has Summary

Intermediate

Automated GenAI-driven search quality evaluation

The article discusses the implementation of an automated GenAI-driven search quality evaluation system for LinkedIn's typeahead suggestions.

AzureGPTV

Xueying Lu

12 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Time-Lapse AI Model Enhances IVF Embryo Selection

Researchers from Weill Cornell Medicine have developed the Blastocyst Evaluation Learning Algorithm (BELA), an AI-powered model that enhances embryo selection in in vitro fertilization (IVF) by eva...

Michelle Horton

3 min read

Has Summary

NVIDIA

Intermediate

Developing a 172B LLM with Strong Japanese Capabilities Using NVIDIA Megatron-LM

The article discusses the development of a 172 billion parameter large language model (LLM) with strong Japanese capabilities using NVIDIA Megatron-LM.

Generative AIGoogle CloudGPTHugging FacePaLMTransformerV

Kazuki Fujii

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Spotlight: Accelerating HPC in Energy with AWS Energy HPC Orchestrator and NVIDIA Energy Samples

The article discusses the integration of the AWS Energy HPC Orchestrator with NVIDIA Energy Samples to enhance high-performance computing (HPC) in the energy sector.

AWSDeep LearningDockerJSONPythonV

Jihyun Yang

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Enabling Customizable GPU-Accelerated Video Transcoding Pipelines

The article discusses the integration of NVIDIA's NVENC technology with V-Nova's MPEG-5 Part 2 Low-Complexity Enhancement Video Coding (LCEVC) standard to create customizable GPU-accelerated video ...

Ricardo Monteiro

9 min read

Has Summary

NVIDIA

Intermediate

Achieving State-of-the-Art Zero-Shot Waveform Audio Generation across Audio Types

The article discusses NVIDIA's advancements in audio generative AI with the introduction of BigVGAN v2, a universal neural vocoder that synthesizes audio waveforms with state-of-the-art quality and...

Deep LearningV

Sang-gil Lee

5 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules

NVIDIA has fully transitioned to open-source GPU kernel modules with the upcoming R560 driver release, enhancing support for various GPU architectures while providing substantial new capabilities.

AnsibleV

Rob Armstrong

6 min read

Includes Code

Has Summary

Intermediate

How data is powering skills-based hiring on LinkedIn

The article discusses how LinkedIn is utilizing data to enhance skills-based hiring through a feature called Skills Match.

Zhujun (Allison) Chen

12 min read

Has Summary

NVIDIA

Advanced

Advanced AI and Retrieval-Augmented Generation for Code Development in High-Performance Computing

The article discusses the integration of advanced AI and Retrieval-Augmented Generation (RAG) techniques in high-performance computing (HPC) code development.

CopilotEmbeddingV

Harry Petty

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Revolutionizing Graph Analytics: Next-Gen Architecture with NVIDIA cuGraph Acceleration

This article discusses the advancements in graph analytics through a next-generation architecture utilizing NVIDIA cuGraph acceleration.

AWSPythonThriftV

Manoj Kumar

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Announcing Confidential Computing General Access on NVIDIA H100 Tensor Core GPUs

NVIDIA has announced the general availability of its Confidential Computing solution on NVIDIA H100 Tensor Core GPUs, which provides enhanced security for data in use, particularly for AI applicati...

AzureRapidsV

Rob Nertney

3 min read

Has Summary

ClickHouse

Intermediate

Training Machine Learning Models with ClickHouse

This article explores how ClickHouse can be utilized as a feature store to train machine learning models, specifically focusing on the integration with Featureform.

AWSDockerKubernetesMachine LearningPythonRedisscikit-learnSeabornSQLV

Dale McDiarmid

28 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Applying Mixture of Experts in LLM Architectures

The article discusses the application of Mixture of Experts (MoE) in large language model (LLM) architectures, highlighting its benefits in terms of model capacity, cost efficiency, and latency red...

GPTGPT-4MistralTransformerTransformersV

Kyle Kranen

11 min read

Has Summary

NVIDIA

Intermediate

Powerful Shader Insights: Using Shader Debug Info with NVIDIA Nsight Graphics

The article discusses the importance of using shader debugging information with NVIDIA Nsight Graphics for optimizing shader performance in ray tracing applications.

Louis Bavoil

6 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Calculating Video Quality Using NVIDIA GPUs and VMAF-CUDA

This article discusses the implementation of Video Multi-Method Assessment Fusion (VMAF) using NVIDIA GPUs and CUDA, highlighting the performance improvements and advantages of VMAF-CUDA over tradi...

DockerV

Cem Moluluo

13 min read

Includes Code

Has Summary

NVIDIA

Advanced

Emulating the Attention Mechanism in Transformer Models with a Fully Convolutional Network

This article discusses the emulation of the attention mechanism in transformer models using a fully convolutional network, specifically targeting improvements in computer vision tasks.

Attention MechanismResNetSelf-AttentionTransformerTransformersV

John Yang

12 min read

Has Summary

Advanced

Evolution of Ads Conversion Optimization Models at Pinterest

The article discusses the evolution of ads conversion optimization models at Pinterest, highlighting the transition from Gradient Boosted Decision Trees (GBDT) to advanced Deep Neural Networks (DNN...

AutoMLMachine LearningNeural NetworksPyTorchTransformerV

Pinterest Engineering

12 min read

Has Summary

ClickHouse

Beginner

ClickHouse Release 23.11

ClickHouse Release 23. 11 introduces a wealth of new features, performance optimizations, and bug fixes, enhancing its capabilities for data processing and analytics.

DrizzleGrafanaPostgreSQLSQLV

The ClickHouse Team

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Mastering LLM Techniques: Inference Optimization

This article discusses inference optimization techniques for large language models (LLMs), highlighting the challenges and solutions associated with memory and compute efficiency.

Autoregressive ModelsBERTGPTSelf-AttentionTransformerV

Shashank Verma

24 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Mastering LLM Techniques: Training

The article discusses the intricacies of training Large Language Models (LLMs) using transformer networks, focusing on model architectures, attention mechanisms, and embedding techniques.

Attention MechanismBERTEmbeddingGPTLarge Language ModelsNeural NetworksRecurrent Neural NetworksSelf-AttentionTransformerTransformersV

Anjali Shah

14 min read

Has Summary

NVIDIA

Advanced

Differentiable Slang: A Shading Language for Renderers That Learn

NVIDIA's Differentiable Slang is a new shading language designed to unify real-time, inverse, and differentiable rendering, enabling seamless integration of machine learning with graphics programmi...

NumPyPythonPyTorchRemixTensorFlowV

Sai Bangaru

11 min read

Includes Code

Has Summary

ClickHouse

Intermediate

Analyzing Hugging Face datasets with ClickHouse

This article explores how to analyze Hugging Face datasets using ClickHouse, specifically through the clickhouse-local tool.

CDNHugging FaceJSONLarge Language ModelsMachine LearningREST APISQLV

Dale McDiarmid

32 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Designing Deep Networks to Process Other Deep Networks

The article discusses the design of deep neural networks (DNNs) that can process the weights of other DNNs, focusing on architectures that leverage the symmetries of weight spaces.

Deep LearningGraph Neural NetworksNeural NetworksTransformerTransformersV

Haggai Maron

14 min read

Has Summary

NVIDIA

Intermediate

Flexible and Powerful Ray Tracing with NVIDIA OptiX 8

NVIDIA OptiX 8 is a powerful ray tracing framework that leverages GPU acceleration to create photorealistic visuals efficiently.

Zach Lo

4 min read

Has Summary

NVIDIA

Advanced

New Video Creation and Streaming Features Accelerated by the NVIDIA Video Codec SDK

The article discusses the new features introduced in NVIDIA's Video Codec SDK 12. 1, focusing on GPU-accelerated video processing through NVENC and NVDEC.

CDNV

Prathap Muthana

7 min read

Has Summary

NVIDIA

Intermediate

Microsoft and TempoQuest Accelerate Wind Energy Forecasts with AceCast

Microsoft and TempoQuest have collaborated to enhance wind energy forecasting using AceCAST, a GPU-accelerated version of the Weather Research and Forecasting (WRF) model.

AzureV

Gene Pache

6 min read

Has Summary