NVIDIA logo

How NVIDIA Uses JAX

29 engineering articles about JAX from NVIDIA's engineering team

Articles

Filter:
NVIDIA logo
NVIDIA
Advanced
The article discusses the integration of the NVSHMEM communication library into the Accelerated Linear Algebra (XLA) compiler to optimize long-context model training in JAX.
Sevin Fide Varoglu
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the NVIDIA Rubin platform, which introduces six new chips designed to create a powerful AI supercomputer.
NVIDIA logo
NVIDIA
Advanced
The article discusses the NVIDIA ALCHEMI Toolkit-Ops, a specialized toolkit designed to accelerate AI-powered atomistic simulations in chemistry and materials science.
Justin S. Smith
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses Autodesk Research's development of the Accelerated Lattice Boltzmann (XLB) library, which enhances computational fluid dynamics (CFD) performance using NVIDIA's Warp and GH200...
Mehdi Ataei
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA's hardware innovations, particularly the Blackwell architecture and NVFP4 precision, along with their open source contributions, are driving advancements in AI.
George Chellapa
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the introduction of Wheel Variants, a new Python packaging standard aimed at improving the installation and packaging workflows for CUDA-accelerated Python packages.
Jonathan Dekhtiar
15 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses techniques for optimizing low-latency communication in inference workloads using JAX and XLA, particularly focusing on the decode phase of large language models (LLMs).
Jaya Shankar
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA cuQuantum is an SDK designed to accelerate quantum computing emulations significantly. The latest update, cuQuantum 25.
Tom Lubowe
4 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA has announced world-record inference performance for the DeepSeek-R1 model using the Blackwell architecture, achieving over 250 tokens per second per user and a maximum throughput of over 30...
NVIDIA logo
NVIDIA
Intermediate
The article discusses the introduction of Gemma 3, a range of lightweight, multimodal, and multilingual models optimized for performance in AI applications.
Anu Srivastava
3 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
New research from the University of Washington demonstrates how deep learning can enhance AI weather models, allowing for more accurate predictions and extending forecast capabilities up to 23 days...
Michelle Horton
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how to build a zero-copy AI sensor processing pipeline using OpenCV within the NVIDIA Holoscan SDK.
Meiran Peng
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the enhancements made in NVIDIA's cuDNN 9 library, focusing on the acceleration of Transformers through the implementation of Scaled Dot Product Attention (SDPA).
Matthew Nicely
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses NVIDIA cuQuantum 23. 10, an SDK designed to accelerate quantum circuit simulations using NVIDIA Tensor Core GPUs.
Tom Lubowe
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the latest features of the NVIDIA NeMo framework and the performance enhancements brought by the NVIDIA H200 GPUs, which significantly improve the training of large language m...
Ashraf Eassa
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA Holoscan is being utilized to accelerate ptychography workflows at the Diamond Light Source, a leading synchrotron facility.
Harry Petty
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article provides a comprehensive guide on deploying AI models in Python using the PyTriton interface with NVIDIA Triton Inference Server.
Shankar Chandrasekaran
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how to efficiently scale large language model (LLM) training across a large GPU cluster using the open-source frameworks Alpa and Ray.
NVIDIA logo
NVIDIA
Advanced
The article discusses RAPIDS RAFT, a library designed to optimize machine learning and data analytics on GPUs by providing reusable computational patterns.
Corey Nolet
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the use of NVIDIA BioNeMo Service for building generative AI pipelines aimed at drug discovery.
NVIDIA logo
NVIDIA
Intermediate
The article discusses the increasing computational demands for AI processing at the edge and introduces the NVIDIA Holoscan SDK v0.
Julien Jomier
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses NVIDIA's BioNeMo service, a framework for training and serving biomolecular large language models (LLMs) designed for predicting protein structures and properties.
Vanessa Braunstein
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA has announced significant updates to its AI software suite, including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS, aimed at accelerating AI research, computer vision, and data science.
NVIDIA logo
NVIDIA
Intermediate
The article discusses the improved interoperability between NVIDIA Vision Programming Interface (VPI) and PyTorch, focusing on how VPI can enhance object detection and tracking in computer vision a...
Sandeep Hiremath
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article introduces NVIDIA Warp, a Python framework designed for writing differentiable graphics and physics simulations on the GPU.
Miles Macklin
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses the implementation of an end-to-end pipeline utilizing zero-copy techniques for efficient data transfer across various machine learning frameworks.
Christian Hundt
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the significance of tensor methods in modern machine learning, particularly their application in NVIDIA's AI algorithms.
Jean Kossaifi
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses the importance of efficient memory layouts and memory pools in machine learning frameworks to enhance interoperability and performance.
NVIDIA logo
NVIDIA
Advanced
The article discusses cuCIM, a new RAPIDS library designed for accelerated n-dimensional image processing and image I/O on GPUs.

You've reached the end! All 29 articles loaded.