How NVIDIA Uses Grafana

30 engineering articles about Grafana from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Other Companies Using Grafana

Articles

Filter:

NVIDIA

Intermediate

Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

The article discusses the NVIDIA Multi-Agent Intelligent Warehouse (MAIW), an AI command layer designed to enhance operational efficiency and supply chain intelligence in automated warehouses.

DockerEmbeddingFastAPIGrafanaHelmJSONJWTOptunaPostgreSQLPrometheusReactRedisSQLTimescaleDB

Tarik Hammadou

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Enabling Horizontal Autoscaling of Enterprise RAG Components on Kubernetes

This article discusses the implementation of horizontal autoscaling for Retrieval-Augmented Generation (RAG) components on Kubernetes, focusing on NVIDIA's microservices architecture.

DockerGrafanaHelmKubernetesMicroservicesPrometheus

Juana Nakfour

23 min read

Includes Code

Has Summary

NVIDIA

Advanced

Next-Generation AI Factory Telemetry with NVIDIA Spectrum-X Ethernet

The article discusses the evolution of AI data centers into AI factories and the necessity for advanced telemetry solutions like NVIDIA Spectrum-X Ethernet to optimize AI workloads.

GrafanagRPC

Berkin Kartal

7 min read

Has Summary

NVIDIA

Advanced

Build and Run Secure, Data-Driven AI Agents

The article discusses the deployment of secure, data-driven AI agents using NVIDIA's AI-Q Research Assistant and Enterprise RAG Blueprints on AWS.

AWSDockerGitGrafanaHelmKubernetesPrometheusServerlessTerraform

Abdullahi Olaoye

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Build an AI Agent to Analyze IT Tickets with NVIDIA Nemotron

The article discusses building an AI agent using NVIDIA Nemotron to analyze IT tickets, focusing on extracting insights from unstructured data through advanced AI reasoning and graph databases.

GrafanaHugging FaceJSON

Bhaskar Bhowmik

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

How to Reduce KV Cache Bottlenecks with NVIDIA Dynamo

The article discusses how NVIDIA Dynamo can help reduce Key-Value (KV) Cache bottlenecks in large language model (LLM) inference by offloading cache data to more cost-effective storage solutions.

GPTGrafanaPrometheusRedis

Amr Elmeleegy

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Dynamo 0.4 Delivers 4x Faster Performance, SLO-Based Autoscaling, and Real-Time Observability

Dynamo 0. 4 introduces significant enhancements for deploying large language models (LLMs) with a focus on performance, observability, and autoscaling based on service-level objectives (SLO).

GrafanaKubernetesPrometheus

Amr Elmeleegy

8 min read

Has Summary

NVIDIA

Intermediate

Run Multimodal Extraction for More Efficient AI Pipelines Using One GPU

This article discusses the challenges of extracting insights from multimodal documents and presents a solution using the NVIDIA NeMo Retriever extraction pipeline.

AWSDockerGrafanaPrometheusPython

Lior Cohen

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Real-Time IT Incident Detection and Intelligence with NVIDIA NIM Inference Microservices and ITMonitron

The article discusses NVIDIA's ITMonitron, a tool designed to enhance real-time IT incident detection by integrating various monitoring signals into actionable intelligence.

GrafanaJSONMicroservicesPrompt EngineeringREST API

Carol Dmello

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Compiler Explorer: An Essential Kernel Playground for CUDA Developers

Compiler Explorer is a web-based tool that allows CUDA developers to write, compile, and run GPU kernels directly in their browser without needing a local setup.

AssemblyGrafanaPythonRust

Jake Hemstad

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Power Real-Time AI Media Effects with New AI Reference Apps on NVIDIA Holoscan for Media

The article discusses the introduction of new AI reference applications by NVIDIA for enhancing real-time media workflows using AI microservices.

GrafanaHelmKubernetesPyTorch

Guillaume Polaillon

3 min read

Has Summary

NVIDIA

Intermediate

Connect Simulations with the Real World Using NVIDIA Air Services

The article discusses how NVIDIA Air Services can connect simulations with real-world data center infrastructure, enhancing capabilities and performance.

AnsibleElasticsearchGrafanaHTTPSPython

Sophia Schuur

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Deploying the NVIDIA AI Blueprint for Cost-Efficient LLM Routing

The article discusses the NVIDIA AI Blueprint for an LLM router, which provides a cost-efficient framework for dynamically routing prompts to the most suitable large language models (LLMs).

ChatGPTDockerFine-tuningGrafanaOpenAI APIPythonRust

Arun Raman

7 min read

Has Summary

NVIDIA

Advanced

Horizontal Autoscaling of NVIDIA NIM Microservices on Kubernetes

This article discusses the horizontal autoscaling of NVIDIA NIM microservices on Kubernetes, focusing on how to set up Kubernetes Horizontal Pod Autoscaling (HPA) based on custom metrics like GPU c...

GrafanaHelmHugging FaceJSONKubernetesMicroservicesPrometheus

Juana Nakfour

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes

The article discusses how to scale Large Language Models (LLMs) using NVIDIA Triton and NVIDIA TensorRT-LLM in a Kubernetes environment.

AWSAzureDockerGenerative AIGPTGrafanaHelmHugging FaceKubernetesNGINXPrometheusPythonPyTorchTensorFlowTraefik

Maggie Zhang

16 min read

Includes Code

Has Summary

NVIDIA

Advanced

Optimize Processes for Large Spaces with the Multi-Camera Tracking Workflow

This article introduces the multi-camera tracking workflow developed by NVIDIA, aimed at optimizing processes in large spaces such as warehouses and airports.

AWSAzureElasticsearchGoogle CloudGrafanaHelmKubernetesMicroservices

Monika Jhuria

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Simplifying Network Operations for AI with NVIDIA Quantum InfiniBand

The article discusses how NVIDIA Quantum InfiniBand simplifies network operations for AI infrastructure, debunking the myth that high performance equates to complexity.

Grafana

Taylor Allison

4 min read

Has Summary

NVIDIA

Advanced

Energy Efficiency in High-Performance Computing: Balancing Speed and Sustainability

The article discusses the critical balance between speed and energy efficiency in high-performance computing (HPC).

Grafana

Chris Porter

15 min read

Includes Code

Has Summary

NVIDIA

Intermediate

NVIDIA Jetson Project of the Month: Recognizing Birds by Sound

The article discusses the Bird@Edge project, an innovative system developed by researchers at the University of Marburg to identify bird species by sound using the NVIDIA Jetson Nano Developer Kit.

GrafanaInfluxDBTensorFlow

Jason Black

6 min read

Has Summary

NVIDIA

Advanced

A Guide to Monitoring Machine Learning Models in Production

This article provides a comprehensive guide on monitoring machine learning models in production, emphasizing the importance of continuous monitoring to ensure model performance and reliability.

AWSGrafanaMachine LearningPrometheusPython

Kurtis Pykes

14 min read

Has Summary

NVIDIA

Intermediate

Reducing Development Time for Intelligent Virtual Assistants in Contact Centers

The article discusses the growing demand for intelligent virtual assistants in contact centers, highlighting how they can enhance customer experience and operational efficiency.

BERTGrafanaHaystackHelmKubernetesPrometheusRasa

Sven Chilton

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Designing an Optimal AI Inference Pipeline for Autonomous Driving

The article discusses the design of an optimal AI inference pipeline for autonomous driving, focusing on the integration of NVIDIA Triton Inference Server by NIO to enhance the efficiency and speed...

DockerGrafanagRPCIstioKubernetesPrometheus

Shankar Chandrasekaran

8 min read

Has Summary

NVIDIA

Intermediate

Troubleshooting Networks with NetQ

The article discusses troubleshooting networks using NetQ, focusing on the complexities of EVPN configurations and the importance of observability in modern data center fabrics.

GrafanaPagerDuty

Michael Kashin

7 min read

Has Summary

NVIDIA

Advanced

The Future of Edge AI is Cloud-Native

The article discusses the significance of cloud-native technology in managing edge AI data centers, emphasizing its benefits in performance, resilience, and operational management.

GrafanaKubernetesPrometheus

Jacob Liberman

6 min read

Has Summary

NVIDIA

Intermediate

Identifying Network and Storage Issues with NVIDIA Advanced Streaming Telemetry

This article discusses the importance of network streaming telemetry, particularly through NVIDIA's What Just Happened (WJH) technology, which enhances visibility into network performance issues.

GrafanagRPC

David Iles

11 min read

Has Summary

NVIDIA

Intermediate

GPU Operator 1.8 Adds Support for HGX and Upgrades

The article discusses the new features and improvements introduced in GPU Operator 1. 8, including support for NVIDIA HGX A100 servers, GPU Operator upgrades, and enhanced monitoring capabilities.

GrafanaHelmKubernetesPrometheus

Troy Estes

4 min read

Has Summary

NVIDIA

Intermediate

Continuously Improving Recommender Systems for Competitive Advantage Using NVIDIA Merlin and MLOps

The article discusses the development and operationalization of recommender systems using NVIDIA Merlin and MLOps practices, emphasizing the importance of continuous improvement for maintaining com...

DockerGoogle CloudGoogle Cloud StorageGrafanagRPCKubernetesPrometheusTensorFlow

Shashank Verma

11 min read

Has Summary

NVIDIA

Advanced

Scaling Inference in High Energy Particle Physics at Fermilab Using NVIDIA Triton Inference Server

The article discusses the application of NVIDIA Triton Inference Server to scale inference processes in high-energy particle physics experiments at Fermilab, specifically focusing on the ProtoDUNE-...

AzureGoogle CloudGrafanagRPCKubernetesPrometheusPyTorchTensorFlow

Shankar Chandrasekaran

8 min read

Has Summary

NVIDIA

Advanced

Monitoring GPUs in Kubernetes with DCGM

This article discusses the importance of monitoring GPUs in Kubernetes environments using NVIDIA Data Center GPU Manager (DCGM).

DockerGrafanaHelmJSONKubernetesPrometheusPythonREST API

Pramod Ramarao

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Setting Up GPU Telemetry with NVIDIA Data Center GPU Manager

The article provides a detailed guide on setting up GPU telemetry using NVIDIA Data Center GPU Manager (DCGM) and integrating it with the collectd telemetry framework.

GrafanaPrometheusPython

Scott McMillan

5 min read

Includes Code

Has Summary

You've reached the end! All 30 articles loaded.