#

PyTorch Programming Tutorials & Engineering Articles

716 PyTorch tutorials, guides, and engineering insights from NVIDIA, Meta, Uber, and more

PyTorch Articles & Tutorials

Filter:
NVIDIA logo
NVIDIA
Advanced
The article discusses the use of NVFP4 low-precision model training to achieve higher throughput without sacrificing accuracy in AI model training.
Aditya Vavre
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how the NVIDIA cuda. compute library enables Python developers to write high-performance GPU code without needing to resort to C++.
Daniel Rodriguez
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA's hardware-software co-design significantly enhanced the inference performance of Sarvam AI's Sovereign 30B model, achieving a 4x speedup on NVIDIA Blackwell archit...
Utkarsh Uppal
14 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses NVIDIA TensorRT LLM AutoDeploy, a beta feature that automates the inference optimization process for large language models (LLMs).
​​Lucas Liebenwein
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
Kimi K2. 5 is an advanced multimodal vision language model (VLM) developed by Kimi, optimized for various AI tasks.
Anu Srivastava
4 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the challenges of Expert Parallel communication in training Mixture-of-Experts (MoE) models and introduces Hybrid-EP, an efficient communication solution that leverages NVIDIA...
Fan Yu
10 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
This article discusses the re-architecture of the serving stack for next-generation ads lightweight ranking models at Pinterest, moving from a traditional Two-Tower architecture to a more complex G...
Pinterest Engineering
11 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the Universal Sparse Tensor (UST), a framework designed to efficiently handle sparse tensors across various applications, including scientific computing and deep learning.
Aart J.C. Bik
13 min read
Includes Code
Has Summary
--
Google logo
Google
Advanced
LiteRT has evolved from its TensorFlow Lite foundation into a universal on-device AI inference framework, now offering production-ready GPU acceleration across six platforms and streamlined NPU int...
Lu Wang, Chintan Parikh, Jingjiang Li, Terry Heo
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the transition from the traditional two-phase API of the CUB library to a new single-call API introduced in CUDA 13. 1.
Giannis Gonidelis
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article provides a detailed guide on implementing high-performance matrix multiplication using NVIDIA's cuTile framework in CUDA.
Jinman Xie
13 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses NVIDIA's advancements in AI model inference performance through the Blackwell architecture, emphasizing improvements in token throughput per watt and the enhancements made to ...
Ashraf Eassa
5 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how recent upgrades to open source AI tools enhance the performance of small language models (SLMs) and diffusion models on NVIDIA RTX PCs.
Annamalai Chockalingam
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the latest software and model optimizations for NVIDIA DGX Spark, highlighting significant performance improvements in AI workflows.
Allen Bourgoyne
5 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the NVIDIA Rubin platform, which introduces six new chips designed to create a powerful AI supercomputer.
NVIDIA logo
NVIDIA
Advanced
NVIDIA introduces the Jetson T4000, enhancing AI and real-time reasoning for robotics and edge AI applications with up to 1200 FP4 TFLOPs of AI compute and 64 GB of memory.
Shashank Maheshwari
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the NVIDIA ALCHEMI Toolkit-Ops, a specialized toolkit designed to accelerate AI-powered atomistic simulations in chemistry and materials science.
Justin S. Smith
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses how to rapidly simulate robotic environments using NVIDIA Isaac Sim and World Labs Marble.
Wonsik Han
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the integration of AI Physics into Technology Computer-Aided Design (TCAD) simulations, highlighting its significance in semiconductor manufacturing.
OpenAI logo
OpenAI
Intermediate
OpenAI has co-founded the Agentic AI Foundation (AAIF) under the Linux Foundation to promote open-source agentic AI.
OpenAI
5 min read
Has Summary
--
Uber logo
Uber
Advanced
The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experien...
Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen
11 min read
Has Summary
--
Meta logo
Meta
Advanced
The article introduces Zoomer, Meta's automated debugging and optimization platform designed to enhance AI performance across its extensive infrastructure.
Prashant Gupta
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how CuTe DSL, a new Python API for CUTLASS 4, simplifies GPU kernel development by reducing compilation times and maintaining performance efficiency similar to CUTLASS C++.
Brandon Sun
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the NVIDIA Collective Communications Library (NCCL) and its capabilities for building scalable and fault-tolerant applications.
Luke Robison
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA's CorrDiff model leverages generative AI for downscaling weather predictions, significantly improving efficiency and reducing computational costs.
Alicia Sui
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses how to achieve 4x faster inference for math problem solving using large language models by optimizing the serving stack, quantization strategy, and decoding methods.
Igor Gitman
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the introduction of a new Kubernetes abstraction called ComputeDomains, designed to facilitate secure GPU-to-GPU memory operations across node boundaries in multi-node NVLink ...
Kevin Klues
13 min read
Includes Code
Has Summary
--
Meta logo
Meta
Intermediate
Meta's Generative Ads Recommendation Model (GEM) is a cutting-edge foundation model designed to enhance ad performance and advertiser ROI by improving the relevance of ad recommendations.
Huayu Li
12 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA cuVS enhances GPU-accelerated vector search in the Faiss library, providing significant performance improvements for similarity search and clustering of dense vecto...
Tarang Jain
10 min read
Includes Code
Has Summary
--
ClickHouse logo
ClickHouse
Intermediate
The article discusses the creation of a website for tracking team activity across GitHub repositories, initially intended as a single report but evolved into a comprehensive tool for comparing vari...
Alexey Milovidov
4 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA's NeMo Automodel simplifies the training of large-scale mixture-of-experts (MoE) models in PyTorch, making it accessible to a broader audience.
Hemil Desai
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...
Kyle Tretina
6 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article reflects on a decade of AI platform development at Pinterest, detailing the evolution from fragmented machine learning stacks to a unified AI platform that supports various models.
Meta logo
Meta
Advanced
The article discusses Meta's implementation of invisible watermarking technology for video content, focusing on its applications for content provenance, AI detection, and source identification.
Wes Castro
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how NVIDIA Run:ai enhances AI infrastructure management on Microsoft Azure by optimizing GPU utilization and simplifying workload orchestration.
Julie Adrounie
8 min read
Has Summary
--
Cursor logo
Cursor
Intermediate
The article discusses Composer, a new agent model designed for software engineering that achieves coding results four times faster than similar models.
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how the NVIDIA DGX Spark supercomputer enhances performance for intensive AI tasks, providing a local alternative to cloud computing.
Allen Bourgoyne
5 min read
Has Summary
--
Uber logo
Uber
Advanced
This article discusses how Uber has integrated explainability into its machine learning platform, Michelangelo, using Integrated Gradients (IG) to provide interpretable attributions for deep learni...
Hugh Chen, Eric Wang, Gaoyuan Huang, Howard Yu, Jia Li, Sally Lee
14 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the integration of machine learning interatomic potentials (MLIPs) into molecular dynamics (MD) simulations using the ML-IAP-Kokkos interface within the LAMMPS MD package.
Justin S. Smith
14 min read
Includes Code
Has Summary
--
Google logo
Google
Intermediate
The article introduces Coral NPU, a full-stack, open-source platform designed to enhance Edge AI capabilities on low-power devices.
Billy Rutledge
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses the integration of the Newton physics engine with NVIDIA Isaac Lab for training quadruped locomotion policies and simulating cloth manipulation.
Mohammad Mohajerani
13 min read
Includes Code
Has Summary
--
Meta logo
Meta
Intermediate
The article discusses Meta's evolution in infrastructure over 21 years, highlighting the significant changes brought about by AI.
Yee Jiun Song
20 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the challenges of cold start latency in deploying large language models (LLMs) and introduces the NVIDIA Run:ai Model Streamer, an open-source Python SDK designed to optimize ...
Omer Dayan
12 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses Autodesk Research's development of the Accelerated Lattice Boltzmann (XLB) library, which enhances computational fluid dynamics (CFD) performance using NVIDIA's Warp and GH200...
Mehdi Ataei
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses the optimization of vision AI workloads using NVIDIA's CUDA-accelerated implementation of SMPTE VC-6, a codec designed for efficient interaction with modern compute architect...
Andreas Kieslinger
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how Quantization Aware Training (QAT) and Quantization Aware Distillation (QAD) can enhance low-precision model accuracy recovery beyond traditional Post-Training Quantization...
Eduardo Alvarez
9 min read
Includes Code
Has Summary
--
Cursor logo
Cursor
Intermediate
The article discusses how Cursor enhances its Tab model for predicting developer actions using online reinforcement learning.
Jacob Jackson, Phillip Kravtsov, Shomil Jain
6 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Pinterest's transition to Moka, a next-generation data processing platform built on AWS Elastic Kubernetes Service (EKS).
NVIDIA logo
NVIDIA
Beginner
NVIDIA is simplifying the deployment of its CUDA software stack by collaborating with various third-party platforms, enabling developers to access CUDA directly through their preferred package mana...
Jonathan Bentz
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how to enhance the efficiency of Large Language Models (LLMs) during inference by utilizing CPU-GPU memory sharing through NVIDIA's NVLink C2C technology.
Afroze Syed
6 min read
Includes Code
Has Summary
--