#
Grafana Programming Tutorials & Engineering Articles
123 Grafana tutorials, guides, and engineering insights from ClickHouse, NVIDIA, Uber, and more
Companies Using This
Grafana Articles & Tutorials
Filter:
The article discusses the pg_stat_ch extension for PostgreSQL, which facilitates the export of metrics to ClickHouse.
1 min read
Has Summary
--
The article discusses the NVIDIA Multi-Agent Intelligent Warehouse (MAIW), an AI command layer designed to enhance operational efficiency and supply chain intelligence in automated warehouses.
Tarik Hammadou
10 min read
Includes Code
Has Summary
--
Uber Engineering details their migration from a legacy monolithic monitoring system to a modern, cloud-native observability platform for their corporate network infrastructure.
Razvan Cicu, Giovanni Pepe
9 min read
Has Summary
--
This article discusses the implementation of horizontal autoscaling for Retrieval-Augmented Generation (RAG) components on Kubernetes, focusing on NVIDIA's microservices architecture.
Juana Nakfour
23 min read
Includes Code
Has Summary
--
The article discusses the evolution of AI data centers into AI factories and the necessity for advanced telemetry solutions like NVIDIA Spectrum-X Ethernet to optimize AI workloads.
Shopify's 2025 Black Friday Cyber Monday (BFCM) live globe was reimagined as an interactive pinball machine running at 120fps in a browser. Built with Three.
The article discusses the deployment of secure, data-driven AI agents using NVIDIA's AI-Q Research Assistant and Enterprise RAG Blueprints on AWS.
Abdullahi Olaoye
8 min read
Includes Code
Has Summary
--
The article introduces advanced tool use features on the Claude Developer Platform, focusing on enabling AI agents to utilize tools more efficiently.
The article discusses Uber's implementation of I/O observability for its massive petabyte-scale data lake, focusing on the challenges and solutions in monitoring data access patterns across its hyb...
Arnav Balyan, Kartik Bommepally, Amruth Sampath, Jing Zhao, Akshayaprakash Sharma
10 min read
Has Summary
--
The article discusses the challenges of identifying the root cause of configuration management failures using Salt at Cloudflare, particularly when dealing with a high volume of changes across nume...
Opeyemi Onikute
17 min read
Includes Code
Has Summary
--
This article discusses the implementation of zone failure resilience in Apache Pinot at Uber, detailing strategies to ensure uninterrupted service during zone failures.
Si Lao, Christina Li, Xuanyi Li, Yang Yang, Ujwala Tulshigiri
10 min read
Has Summary
--
The article discusses building an AI agent using NVIDIA Nemotron to analyze IT tickets, focusing on extracting insights from unstructured data through advanced AI reasoning and graph databases.
Bhaskar Bhowmik
10 min read
Includes Code
Has Summary
--
This article discusses the rebuilding of Uber's Apache Pinot™ query architecture, focusing on the transition from Neutrino to a new query system that utilizes Pinot's Multi-Stage Engine Lite Mode.
The article discusses how NVIDIA Dynamo can help reduce Key-Value (KV) Cache bottlenecks in large language model (LLM) inference by offloading cache data to more cost-effective storage solutions.
Amr Elmeleegy
11 min read
Includes Code
Has Summary
--
The article discusses the rising costs associated with observability in software engineering and proposes a shift towards open, cost-efficient architectures.
Mike Shi
13 min read
Has Summary
--
This article discusses how to instrument a Next. js application using OpenTelemetry and ClickStack, focusing on the integration of observability and analytics through ClickHouse.
Dynamo 0. 4 introduces significant enhancements for deploying large language models (LLMs) with a focus on performance, observability, and autoscaling based on service-level objectives (SLO).
Amr Elmeleegy
8 min read
Has Summary
--
The article discusses the evolution of ClickHouse's observability platform, LogHouse, as it scales beyond 100 petabytes of data.
Rory Crispin, Dale McDiarmid
30 min read
Includes Code
Has Summary
--
This article discusses the challenges of extracting insights from multimodal documents and presents a solution using the NVIDIA NeMo Retriever extraction pipeline.
Lior Cohen
8 min read
Includes Code
Has Summary
--
The article discusses NVIDIA's ITMonitron, a tool designed to enhance real-time IT incident detection by integrating various monitoring signals into actionable intelligence.
Carol Dmello
11 min read
Includes Code
Has Summary
--
Compiler Explorer is a web-based tool that allows CUDA developers to write, compile, and run GPU kernels directly in their browser without needing a local setup.
The article discusses the introduction of new AI reference applications by NVIDIA for enhancing real-time media workflows using AI microservices.
Guillaume Polaillon
3 min read
Has Summary
--
The article discusses how NVIDIA Air Services can connect simulations with real-world data center infrastructure, enhancing capabilities and performance.
Sophia Schuur
6 min read
Includes Code
Has Summary
--
This article discusses Uber's implementation of elastic resource management on Kubernetes, focusing on enhancements made to support Ray-based job management.
Bharat Joshi, Anant Vyas, Ben Wang, Axansh Sheth, Abhinav Dixit
10 min read
Has Summary
--
Uber's blog post discusses their migration of machine learning workloads to Kubernetes using Ray, detailing the challenges faced with their previous setup and the improvements achieved with the new...
Bharat Joshi, Anant Vyas, Ben Wang, Min Cai, Axansh Sheth, Abhinav Dixit
18 min read
Has Summary
--
The article discusses the NVIDIA AI Blueprint for an LLM router, which provides a cost-efficient framework for dynamically routing prompts to the most suitable large language models (LLMs).
Arun Raman
7 min read
Has Summary
--
This article discusses the horizontal autoscaling of NVIDIA NIM microservices on Kubernetes, focusing on how to set up Kubernetes Horizontal Pod Autoscaling (HPA) based on custom metrics like GPU c...
Juana Nakfour
7 min read
Includes Code
Has Summary
--
This article discusses the implementation of the Medallion architecture using ClickHouse, a powerful database management system.
The article discusses the strategic partnership between Palantir and Grafana Labs aimed at enhancing IT innovation within federal agencies.
Palantir
6 min read
Has Summary
--
The article discusses the evolution of SQL-based observability, focusing on ClickHouse's advancements over the past year.
Dale McDiarmid & Ryadh Dahimene
25 min read
Includes Code
Has Summary
--
The article discusses the launch of Workers Builds, an integrated CI/CD workflow on the Workers platform, enabling developers to build and deploy applications seamlessly from GitHub or GitLab.
Serena Shah-Simpson
15 min read
Includes Code
Has Summary
--
The article discusses how to scale Large Language Models (LLMs) using NVIDIA Triton and NVIDIA TensorRT-LLM in a Kubernetes environment.
AWSAzureDockerGenerative AIGPTGrafanaHelmHugging FaceKubernetesNGINXPrometheusPythonPyTorchTensorFlowTraefik
Maggie Zhang
16 min read
Includes Code
Has Summary
--
This article discusses building single page applications (SPAs) using ClickHouse with a focus on a 'client only' architecture.
The article discusses Uber's migration of its batch data platform to the cloud, focusing on the implementation of DataMesh principles.
Arun Mahadeva Iyer, Abhi Khune, Sahana Bhat
11 min read
Has Summary
--
The article discusses the integration of Ray infrastructure at Pinterest, detailing the journey, challenges, and solutions implemented to optimize machine learning workflows.
Pinterest Engineering
16 min read
Includes Code
Has Summary
--
This article introduces the multi-camera tracking workflow developed by NVIDIA, aimed at optimizing processes in large spaces such as warehouses and airports.
Monika Jhuria
11 min read
Includes Code
Has Summary
--
The article discusses strategies to minimize on-call burnout through effective alert observability, emphasizing the importance of actionable alerts and the analysis of alert data.
Monika Singh
12 min read
Includes Code
Has Summary
--
GrafLI is a cloud-native monitoring and visualization platform developed by LinkedIn to enhance the developer experience and increase engineering velocity in Azure environments.
Prateek Singh
16 min read
Includes Code
Has Summary
--
This article details the development of a ClickHouse-powered logging platform, named LogHouse, which efficiently manages over 19 PiB of log data while significantly reducing costs compared to tradi...
The article discusses Fly. io's innovative approach to enhancing WireGuard's performance and scalability by implementing Just-In-Time (JIT) peer configuration.
The article discusses Uber's efforts to improve load balancing across heterogeneous hardware, focusing on enhancing efficiency and CPU utilization for stateless services.
Pawel Krolikowski, Chien-Chih Liao, Ying Jiang
32 min read
Has Summary
--
ClickHouse version 24. 1 introduces 26 new features, 22 performance optimizations, and 47 bug fixes, enhancing its capabilities for data processing and analytics.
The article discusses how NVIDIA Quantum InfiniBand simplifies network operations for AI infrastructure, debunking the myth that high performance equates to complexity.
Taylor Allison
4 min read
Has Summary
--
ClickHouse Release 23. 11 introduces a wealth of new features, performance optimizations, and bug fixes, enhancing its capabilities for data processing and analytics.
The ClickHouse Team
11 min read
Includes Code
Has Summary
--
The article discusses how to enhance Google Analytics data using ClickHouse, focusing on the limitations of GA4 and the advantages of ClickHouse for flexible, fast analytics with infinite data rete...
Dale McDiarmid
21 min read
Includes Code
Has Summary
--
The article discusses the critical balance between speed and energy efficiency in high-performance computing (HPC).
Chris Porter
15 min read
Includes Code
Has Summary
--
This article discusses the CGW Stack, which combines ClickHouse, Grafana, and WarpStream to provide a cost-effective and efficient logging solution at scale.
Dale McDiarmid & Ryadh Dahimene
25 min read
Includes Code
Has Summary
--
The article discusses how Prisma successfully reduced its engine distribution costs by 98% by migrating from AWS S3 and CloudFront to Cloudflare R2.
Pierre-Antoine Mills (Guest Author)
9 min read
Has Summary
--
The article discusses ClickHouse's journey to enhance MySQL compatibility, enabling users to connect BI tools like Looker Studio and Tableau online through the MySQL protocol.
ClickHouse Cloud has introduced compatibility with the MySQL protocol, enabling users to connect various third-party business intelligence tools like Looker Studio and Tableau.