How NVIDIA Uses Hugging Face

188 engineering articles about Hugging Face from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Other Companies Using Hugging Face

Articles

Filter:

NVIDIA

Advanced

Using NVFP4 Low-Precision Model Training for Higher Throughput Without Losing Accuracy

The article discusses the use of NVFP4 low-precision model training to achieve higher throughput without sacrificing accuracy in AI model training.

Hugging FacePyTorch

Aditya Vavre

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI’s

The article discusses how NVIDIA's hardware-software co-design significantly enhanced the inference performance of Sarvam AI's Sovereign 30B model, achieving a 4x speedup on NVIDIA Blackwell archit...

Hugging FacePyTorchTransformer

Utkarsh Uppal

14 min read

Has Summary

NVIDIA

Advanced

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

The article discusses NVIDIA TensorRT LLM AutoDeploy, a beta feature that automates the inference optimization process for large language models (LLMs).

Hugging FacePyTorchTransformersV

Lucas Liebenwein

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints

Kimi K2. 5 is an advanced multimodal vision language model (VLM) developed by Kimi, optimized for various AI tasks.

EmbeddingFine-tuningHugging FacePyTorch

Anu Srivastava

4 min read

Includes Code

Has Summary

NVIDIA

Advanced

How to Build a Document Processing Pipeline for RAG with Nemotron

The article provides a comprehensive guide on building a document processing pipeline using NVIDIA Nemotron RAG, focusing on the extraction of structured data from complex documents like PDFs.

DockerEmbeddingHugging FaceJSONPythonRedistorchvision

Chia-Chih Chen

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

How to Unlock Local Detail in Coarse Climate Projections with NVIDIA Earth-2

The article discusses how to utilize NVIDIA Earth-2 to downscale coarse climate projections into high-resolution, bias-corrected fields, enabling better assessment of local climate extremes.

Deep LearningHugging FacePythonYAML

Georg Ertl

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning

This article explores how to train an AI agent to operate a new Command Line Interface (CLI) using synthetic data generation and reinforcement learning.

Hugging FaceJSONPythonReinforcement LearningRLHFShell

Chris Alexiuk

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Building Generalist Humanoid Capabilities with NVIDIA Isaac GR00T N1.6 Using a Sim-to-Real Workflow

The article discusses the development of generalist humanoid capabilities using NVIDIA Isaac GR00T N1. 6 through a sim-to-real workflow.

Hugging Face

Edith Llontop

7 min read

Has Summary

NVIDIA

Intermediate

Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM

The article discusses the introduction of NVIDIA TensorRT Edge-LLM, an open-source C++ framework designed for high-performance inference of Large Language Models (LLMs) and Vision Language Models (...

ChiHugging FacePythonTransformers

Lin Chai

5 min read

Includes Code

Has Summary

NVIDIA

Intermediate

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

The article discusses the latest software and model optimizations for NVIDIA DGX Spark, highlighting significant performance improvements in AI workflows.

GPTHugging FacePyTorch

Allen Bourgoyne

5 min read

Has Summary

NVIDIA

Advanced

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

The article discusses the NVIDIA Rubin platform, which introduces six new chips designed to create a powerful AI supercomputer.

AssemblyHugging FaceJAXKubernetesLessPyTorchRLHFTransformer

Kyle Aubrey

59 min read

Has Summary

NVIDIA

Advanced

Simplify Generalist Robot Policy Evaluation in Simulation with NVIDIA Isaac Lab-Arena

The article introduces NVIDIA Isaac Lab-Arena, an open-source framework designed for efficient and scalable evaluation of generalist robot policies in simulation.

DockerHugging Face

Sangeeta Subramanian

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

How to Build a Voice Agent with RAG and Safety Guardrails

This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.

EmbeddingHugging FacePythonTransformerTransformers

Chris Alexiuk

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Building Autonomous Vehicles That Reason with NVIDIA Alpamayo

The article discusses NVIDIA's Alpamayo, a comprehensive ecosystem designed for developing reasoning-based autonomous vehicle (AV) systems.

gRPCHugging FacePython

Marco Pavone

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Using AI Physics for Technology Computer-Aided Design Simulations

The article discusses the integration of AI Physics into Technology Computer-Aided Design (TCAD) simulations, highlighting its significance in semiconductor manufacturing.

Graph Neural NetworksHugging FaceNeural NetworksPythonPyTorch

Ram Cherukuri

7 min read

Has Summary

NVIDIA

Intermediate

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It Efficient and Accurate

The article discusses the NVIDIA Nemotron 3, a family of open models designed for agentic AI systems, emphasizing its efficiency and accuracy through innovative architectures and techniques.

Hugging FaceLarge Language ModelsReinforcement LearningTransformer

Chris Alexiuk

9 min read

Has Summary

NVIDIA

Intermediate

How to Build Privacy-Preserving Evaluation Benchmarks with Synthetic Data

The article discusses the creation of privacy-preserving evaluation benchmarks using synthetic data, particularly in regulated domains like healthcare.

Hugging FaceMicroservices

Isabel Hulseman

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Getting Started with Edge AI on NVIDIA Jetson: LLMs, VLMs, and Foundation Models for Robotics

The article discusses the implementation of Edge AI on the NVIDIA Jetson platform, focusing on the use of Large Language Models (LLMs), Vision Language Models (VLMs), and Foundation Models in robot...

Hugging FaceOllamaWebRTC

Chitoku Yato

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

How to Enhance 3D Gaussian Reconstruction Quality for Simulation

The article discusses enhancing the quality of 3D Gaussian reconstruction for simulation, focusing on the use of NVIDIA's Fixer model to eliminate rendering artifacts.

Diffusion ModelsDockerHugging Face

Wonsik Han

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

NVIDIA-Accelerated Mistral 3 Open Models Deliver Efficiency, Accuracy at Any Scale

The NVIDIA-accelerated Mistral 3 open model family offers developers and enterprises industry-leading accuracy, efficiency, and customization capabilities.

DockerHugging FaceMistralOllama

Anu Srivastava

6 min read

Has Summary

NVIDIA

Intermediate

Breaking Through Reinforcement Learning Training Limits with Scaling Rollouts in BroRL

The article introduces Broadened Reinforcement Learning (BroRL), a new paradigm that enhances the training of large language models (LLMs) by focusing on rollout scaling rather than just increasing...

Hugging FaceReinforcement Learning

Jian Hu

6 min read

Has Summary

NVIDIA

Advanced

How to Achieve 4x Faster Inference for Math Problem Solving

This article discusses how to achieve 4x faster inference for math problem solving using large language models by optimizing the serving stack, quantization strategy, and decoding methods.

Hugging FacePythonPyTorch

Igor Gitman

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

Streamline Complex AI Inference on Kubernetes with NVIDIA Grove

The article discusses NVIDIA Grove, a Kubernetes API designed to streamline complex AI inference workloads by managing multicomponent systems.

HelmHugging FaceKubernetesYAML

Sanjay Chatterjee

9 min read

Includes Code

Has Summary

NVIDIA

Beginner

Benchmarking LLMs on AI-Generated CUDA Code with ComputeEval 2025.2

The article discusses the benchmarking of AI coding assistants in writing efficient CUDA code using the ComputeEval framework.

ClaudeGPTHugging Face

Daniel Rodriguez

2 min read

Has Summary

NVIDIA

Advanced

Democratizing Large-Scale Mixture-of-Experts Training with NVIDIA PyTorch Paralism

The article discusses how NVIDIA's NeMo Automodel simplifies the training of large-scale mixture-of-experts (MoE) models in PyTorch, making it accessible to a broader audience.

GPTHugging FacePyTorchTransformer

Hemil Desai

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes

The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...

Hugging FacePyTorchTransformerTransformers

Kyle Tretina

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Advancing Explainable AI in Radiology Research with NVIDIA Clara Reason

The article discusses the advancements in Explainable AI for radiology through NVIDIA Clara Reason, focusing on the NV-Reason-CXR-3B model that enhances diagnostic transparency and mimics radiologi...

GPTHugging FacePIL

Andriy Myronenko

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Streamline AI Infrastructure with NVIDIA Run:ai on Microsoft Azure

The article discusses how NVIDIA Run:ai enhances AI infrastructure management on Microsoft Azure by optimizing GPU utilization and simplifying workload orchestration.

AzureAzure Blob StorageHugging FaceKubernetesPyTorch

Julie Adrounie

8 min read

Has Summary

NVIDIA

Advanced

Introducing the CodonFM Open Model for RNA Design and Analysis

The article introduces CodonFM, a new state-of-the-art RNA foundation model developed by NVIDIA as part of the Clara open model family.

BERTFine-tuningHugging FaceTransformer

Kyle Gion

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Develop Specialized AI Agents with New NVIDIA Nemotron Vision, RAG, and Guardrail Models

The article discusses the launch of NVIDIA's new Nemotron models designed for developing specialized AI agents that integrate language and vision capabilities.

Hugging FaceLangChainReplicateSemantic Kernel

Chris Alexiuk

8 min read

Has Summary

NVIDIA

Intermediate

How NVIDIA DGX Spark’s Performance Enables Intensive AI Tasks

The article discusses how the NVIDIA DGX Spark supercomputer enhances performance for intensive AI tasks, providing a local alternative to cloud computing.

Fine-tuningGPTHugging FacePyTorchscikit-learn

Allen Bourgoyne

5 min read

Has Summary

NVIDIA

Intermediate

Reconstruct a Scene in NVIDIA Isaac Sim Using Only a Smartphone

This article outlines a streamlined process for reconstructing 3D environments for robotics simulation using only a smartphone, specifically an iPhone.

Hugging Face

Wonsik Han

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Train an LLM on NVIDIA Blackwell with Unsloth—and Scale for Production

The article discusses how to fine-tune and scale large language models (LLMs) using the open-source Unsloth framework on NVIDIA Blackwell GPUs.

DockerFine-tuningHugging FacePython

Paul Abruzzo

5 min read

Includes Code

Has Summary

NVIDIA

Advanced

Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour

This article guides readers through the process of creating a Bash computer use agent using the NVIDIA Nemotron Nano v2 model.

Hugging FaceJSONPython

Mehran Maghoumi

14 min read

Includes Code

Has Summary

NVIDIA

Advanced

Build an AI Agent to Analyze IT Tickets with NVIDIA Nemotron

The article discusses building an AI agent using NVIDIA Nemotron to analyze IT tickets, focusing on extracting insights from unstructured data through advanced AI reasoning and graph databases.

GrafanaHugging FaceJSON

Bhaskar Bhowmik

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Build a Log Analysis Multi-Agent Self-Corrective RAG System with NVIDIA Nemotron

The article discusses the development of an AI-powered log analysis solution using NVIDIA's Generative AI reference workflows.

EmbeddingFine-tuningGenerative AIHugging Face

Prashant Bhende

5 min read

Includes Code

Has Summary

NVIDIA

Advanced

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer

The article discusses the optimization of large language models (LLMs) through pruning and knowledge distillation using NVIDIA TensorRT Model Optimizer.

EmbeddingHugging FaceTransformer

Max Xu

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Enable Gang Scheduling and Workload Prioritization in Ray with NVIDIA KAI Scheduler

The article discusses the integration of the NVIDIA KAI Scheduler with Ray, enabling advanced scheduling features like gang scheduling, workload prioritization, and autoscaling in Ray clusters.

HelmHugging FaceKubernetesYAML

Ekin Karabulut

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Smart Multi-Node Scheduling for Fast and Efficient LLM Inference with NVIDIA Run:ai and NVIDIA Dynamo

The article discusses the integration of NVIDIA Run:ai v2. 23 with NVIDIA Dynamo to address the challenges of large language model (LLM) inference across distributed environments.

HelmHugging FaceJSONKubernetesYAML

Ekin Karabulut

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

3 Easy Ways to Supercharge Your Robotics Development Using OpenUSD

The article discusses how OpenUSD can enhance robotics development through improved data ingestion, aggregation, and the use of SimReady assets.

Hugging FacePython

Matias Codesal

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

How to Integrate Computer Vision Pipelines with Generative AI and Reasoning

The article discusses the integration of computer vision pipelines with Generative AI and reasoning, highlighting the advancements in video analytics through NVIDIA's Blueprint for Video Search and...

Computer VisionGenerative AIHugging FaceREST API

Samuel Ochoa

11 min read

Has Summary

NVIDIA

Advanced

Build a Retrieval-Augmented Generation (RAG) Agent with NVIDIA Nemotron

The article provides a comprehensive guide on building a Retrieval-Augmented Generation (RAG) agent using NVIDIA Nemotron, emphasizing the integration of external information to enhance text genera...

DockerEmbeddingHugging FaceLangChainPythonStreamlitVector Database

Edward Li

16 min read

Includes Code

Has Summary

NVIDIA

Advanced

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

The article introduces speculative decoding as a technique to reduce latency in AI inference, particularly for large language models (LLMs).

Hugging FaceTransformer

Jamie Li

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Reducing Cold Start Latency for LLM Inference with NVIDIA Run:ai Model Streamer

The article discusses the challenges of cold start latency in deploying large language models (LLMs) and introduces the NVIDIA Run:ai Model Streamer, an open-source Python SDK designed to optimize ...

AWSAWS S3HTTPSHugging FacePythonPyTorchTransformers

Omer Dayan

12 min read

Has Summary

NVIDIA

Intermediate

Build a Report Generator AI Agent with NVIDIA Nemotron on OpenRouter

This article provides a comprehensive guide on building a report generation AI agent using NVIDIA Nemotron on OpenRouter.

Hugging FaceReact

Edward Li

13 min read

Includes Code

Has Summary

NVIDIA

Intermediate

New Open Source Qwen3-Next Models Preview Hybrid MoE Architecture Delivering Improved Accuracy and

The article discusses the release of two new open-source models, Qwen3-Next 80B-A3B-Thinking and Qwen3-Next 80B-A3B-Instruct, which utilize a hybrid Mixture of Experts (MoE) architecture to enhance...

Hugging FaceLessTransformer

Anu Srivastava

4 min read

Includes Code

Has Summary

NVIDIA

Advanced

How Quantization Aware Training Enables Low-Precision Accuracy Recovery

The article discusses how Quantization Aware Training (QAT) and Quantization Aware Distillation (QAD) can enhance low-precision model accuracy recovery beyond traditional Post-Training Quantization...

Hugging FacePyTorch

Eduardo Alvarez

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Deploy Scalable AI Inference with NVIDIA NIM Operator 3.0.0

The article discusses the deployment of scalable AI inference using NVIDIA NIM Operator 3. 0. 0, highlighting its capabilities in managing AI inference pipelines across Kubernetes environments.

Generative AIHugging FaceKubernetesMicroservicesServerless

Meenakshi Kaushik

6 min read

Includes Code

Has Summary

NVIDIA

Intermediate

How to Build AI Systems In House with Outerbounds and DGX Cloud Lepton

The article discusses how to build in-house AI systems using Outerbounds and NVIDIA DGX Cloud Lepton, emphasizing the importance of orchestrating multiple models and dynamic data.

Hugging FacePython

Ville Tuulos

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerate Large-Scale LLM Inference and KV Cache Offload with CPU-GPU Memory Sharing

The article discusses how to enhance the efficiency of Large Language Models (LLMs) during inference by utilizing CPU-GPU memory sharing through NVIDIA's NVLink C2C technology.

Hugging FaceLarge Language ModelsPythonPyTorch

Afroze Syed

6 min read

Includes Code

Has Summary