How NVIDIA Uses JAX

29 engineering articles about JAX from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Other Companies Using JAX

Google(102)

Articles

Filter:

NVIDIA

Advanced

Accelerating Long-Context Model Training in JAX and XLA

The article discusses the integration of the NVSHMEM communication library into the Accelerated Linear Algebra (XLA) compiler to optimize long-context model training in JAX.

DockerJAXPython

Sevin Fide Varoglu

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

The article discusses the NVIDIA Rubin platform, which introduces six new chips designed to create a powerful AI supercomputer.

AssemblyHugging FaceJAXKubernetesLessPyTorchRLHFTransformer

Kyle Aubrey

59 min read

Has Summary

NVIDIA

Advanced

Accelerating AI-Powered Chemistry and Materials Science Simulations with NVIDIA ALCHEMI Toolkit-Ops

The article discusses the NVIDIA ALCHEMI Toolkit-Ops, a specialized toolkit designed to accelerate AI-powered atomistic simulations in chemistry and materials science.

JAXPythonPyTorchWarp

Justin S. Smith

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Autodesk Research Brings Warp Speed to Computational Fluid Dynamics on NVIDIA GH200

The article discusses Autodesk Research's development of the Accelerated Lattice Boltzmann (XLB) library, which enhances computational fluid dynamics (CFD) performance using NVIDIA's Warp and GH200...

FortranJAXNumbaNumPyPythonPyTorchWarp

Mehdi Ataei

7 min read

Has Summary

NVIDIA

Advanced

NVIDIA Hardware Innovations and Open Source Contributions Are Shaping AI

The article discusses how NVIDIA's hardware innovations, particularly the Blackwell architecture and NVFP4 precision, along with their open source contributions, are driving advancements in AI.

GPTHugging FaceJAXKubernetesPythonPyTorchTransformer

George Chellapa

8 min read

Has Summary

NVIDIA

Intermediate

Streamline CUDA-Accelerated Python Install and Packaging Workflows with Wheel Variants

The article discusses the introduction of Wheel Variants, a new Python packaging standard aimed at improving the installation and packaging workflows for CUDA-accelerated Python packages.

DockerJAXPythonPyTorchSciPy

Jonathan Dekhtiar

15 min read

Includes Code

Has Summary

NVIDIA

Advanced

Optimizing for Low-Latency Communication in Inference Workloads with JAX and XLA

The article discusses techniques for optimizing low-latency communication in inference workloads using JAX and XLA, particularly focusing on the decode phase of large language models (LLMs).

JAXPython

Jaya Shankar

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

NVIDIA cuQuantum Adds Dynamics Gradients, DMRG, and Simulation Speedup

NVIDIA cuQuantum is an SDK designed to accelerate quantum computing emulations significantly. The latest update, cuQuantum 25.

JAX

Tom Lubowe

4 min read

Includes Code

Has Summary

NVIDIA

Advanced

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA has announced world-record inference performance for the DeepSeek-R1 model using the Blackwell architecture, achieving over 250 tokens per second per user and a maximum throughput of over 30...

CLIPHugging FaceJAXOllamaPythonPyTorchT5TensorFlowTransformer

Ashraf Eassa

13 min read

Has Summary

NVIDIA

Intermediate

Lightweight, Multimodal, Multilingual Gemma 3 Models Are Streamlined for Performance

The article discusses the introduction of Gemma 3, a range of lightweight, multimodal, and multilingual models optimized for performance in AI applications.

JAXLangChainPython

Anu Srivastava

3 min read

Includes Code

Has Summary

NVIDIA

Intermediate

AI Accurately Forecasts Extreme Weather Up to 23 Days Ahead

New research from the University of Washington demonstrates how deep learning can enhance AI weather models, allowing for more accurate predictions and extending forecast capabilities up to 23 days...

JAX

Michelle Horton

3 min read

Has Summary

NVIDIA

Intermediate

Build a Zero-Copy AI Sensor Processing Pipeline with OpenCV in NVIDIA Holoscan SDK

The article discusses how to build a zero-copy AI sensor processing pipeline using OpenCV within the NVIDIA Holoscan SDK.

Computer VisionDeep LearningJAXNumbaNumPyOpenCVPythonPyTorchTensorFlow

Meiran Peng

7 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Accelerating Transformers with NVIDIA cuDNN 9

The article discusses the enhancements made in NVIDIA's cuDNN 9 library, focusing on the acceleration of Transformers through the implementation of Scaled Dot Product Attention (SDPA).

JAXPythonPyTorchTensorFlowTransformerTransformers

Matthew Nicely

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerate Quantum Circuit Simulation with NVIDIA cuQuantum 23.10

The article discusses NVIDIA cuQuantum 23. 10, an SDK designed to accelerate quantum circuit simulations using NVIDIA Tensor Core GPUs.

JAXPyTorch

Tom Lubowe

3 min read

Has Summary

NVIDIA

Advanced

New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility

The article discusses the latest features of the NVIDIA NeMo framework and the performance enhancements brought by the NVIDIA H200 GPUs, which significantly improve the training of large language m...

GPTJAXPyTorchRLHF

Ashraf Eassa

9 min read

Has Summary

NVIDIA

Advanced

Accelerating Ptychography Workflows with NVIDIA Holoscan at Diamond Light Source

The article discusses how NVIDIA Holoscan is being utilized to accelerate ptychography workflows at the Diamond Light Source, a leading synchrotron facility.

JAXNumPyPython

Harry Petty

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

How to Deploy an AI Model in Python with PyTriton

This article provides a comprehensive guide on deploying AI models in Python using the PyTriton interface with NVIDIA Triton Inference Server.

BERTFastAPIFlaskGPTHugging FaceJAXKubernetesPythonPyTorchStable Diffusion

Shankar Chandrasekaran

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Efficiently Scale LLM Training Across a Large GPU Cluster with Alpa and Ray

The article discusses how to efficiently scale large language model (LLM) training across a large GPU cluster using the open-source frameworks Alpa and Ray.

AWSBERTChatGPTDALL-EGenerative AIGPTJAXPythonRoBERTaStable DiffusionT5TensorFlow

Jiao Dong

14 min read

Includes Code

Has Summary

NVIDIA

Advanced

Reusable Computational Patterns for Machine Learning and Information Retrieval with RAPIDS RAFT

The article discusses RAPIDS RAFT, a library designed to optimize machine learning and data analytics on GPUs by providing reusable computational patterns.

JAXMachine LearningNumbaNumPyPythonPyTorchTensorFlow

Corey Nolet

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Build Generative AI Pipelines for Drug Discovery with NVIDIA BioNeMo Service

The article discusses the use of NVIDIA BioNeMo Service for building generative AI pipelines aimed at drug discovery.

Artificial IntelligenceBERTDeep LearningGenerative AIJAXPyTorchYAML

Vanessa Braunstein

8 min read

Has Summary

NVIDIA

Intermediate

Rapidly Build AI-Streaming Apps with Python and C++

The article discusses the increasing computational demands for AI processing at the edge and introduces the NVIDIA Holoscan SDK v0.

ApacheJAXNumbaNumPyPythonPyTorchTensorFlow

Julien Jomier

5 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Predict Protein Structures and Properties with Biomolecular Large Language Models

The article discusses NVIDIA's BioNeMo service, a framework for training and serving biomolecular large language models (LLMs) designed for predicting protein structures and properties.

BERTJAXLarge Language ModelsPyTorch

Vanessa Braunstein

3 min read

Has Summary

NVIDIA

Advanced

New SDKs Accelerating AI Research, Computer Vision, Data Science, and More

NVIDIA has announced significant updates to its AI software suite, including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS, aimed at accelerating AI research, computer vision, and data science.

ApacheApache SparkComputer VisionDaskDeep LearningDGLGoogle CloudGPTJAXKubernetesNeural NetworksNumPyPyTorchPyTorch GeometricSQL

Siddharth Sharma

7 min read

Has Summary

NVIDIA

Intermediate

Improved Interoperability between VPI and PyTorch

The article discusses the improved interoperability between NVIDIA Vision Programming Interface (VPI) and PyTorch, focusing on how VPI can enhance object detection and tracking in computer vision a...

JAXNumbaNumPyOpenCVPythonPyTorchtorchvision

Sandeep Hiremath

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Creating Differentiable Graphics and Physics Simulation in Python with NVIDIA Warp

The article introduces NVIDIA Warp, a Python framework designed for writing differentiable graphics and physics simulations on the GPU.

JAXNumPyPythonPyTorchWarp

Miles Macklin

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Machine Learning Frameworks Interoperability, Part 3: Zero-Copy in Action using an E2E Pipeline

This article discusses the implementation of an end-to-end pipeline utilizing zero-copy techniques for efficient data transfer across various machine learning frameworks.

JAXMachine LearningNumbaPythonPyTorch

Christian Hundt

7 min read

Has Summary

NVIDIA

Advanced

NVIDIA Research: Tensors Are the Future of Deep Learning

The article discusses the significance of tensor methods in modern machine learning, particularly their application in NVIDIA's AI algorithms.

Deep LearningJAXNumPyPyTorchTensorFlow

Jean Kossaifi

4 min read

Has Summary

NVIDIA

Intermediate

Machine Learning Frameworks Interoperability, Part 1: Memory Layouts and Memory Pools

This article discusses the importance of efficient memory layouts and memory pools in machine learning frameworks to enhance interoperability and performance.

ApacheApache ArrowCassandraJAXMachine LearningNumbaNumPyPandasPythonPyTorchTensorFlow

Christian Hundt

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerating Scikit-Image API with cuCIM: n-Dimensional Image Processing and I/O on GPUs

The article discusses cuCIM, a new RAPIDS library designed for accelerated n-dimensional image processing and image I/O on GPUs.

AlbumentationsApacheDaskDeep LearningITKJavaJAXNumbaNumPyOpenCVPythonPyTorchscikit-imageSciPySimpleITK

Gigon Bae

6 min read

Includes Code

Has Summary

You've reached the end! All 29 articles loaded.