Identifying Network and Storage Issues with NVIDIA Advanced Streaming Telemetry

NVIDIA What Just Happened is a hardware-accelerated telemetry technology where the switch ASIC holds onto important parts of dropped packets. Learn how it helps…

David Iles
11 min readintermediate
--
View Original

Overview

This article discusses the importance of network streaming telemetry, particularly through NVIDIA's What Just Happened (WJH) technology, which enhances visibility into network performance issues. It emphasizes the shift from traditional protocols to advanced telemetry for better diagnostics and faster problem resolution in data centers.

What You'll Learn

1

How to utilize NVIDIA What Just Happened (WJH) for enhanced network visibility

2

Why advanced streaming telemetry is crucial for diagnosing network issues

3

When to implement WJH in your network for optimal performance monitoring

Prerequisites & Requirements

  • Basic understanding of network management concepts(optional)
  • Familiarity with data center operations(optional)

Key Questions Answered

What is the purpose of NVIDIA What Just Happened (WJH)?
NVIDIA What Just Happened (WJH) is a telemetry technology that provides detailed insights into network performance by capturing essential data about dropped packets and other issues. It helps network administrators quickly identify the root causes of problems without needing to reproduce issues, thus improving operational efficiency.
How does WJH improve network diagnostics compared to traditional methods?
WJH enhances network diagnostics by retaining critical information about dropped packets, such as source and destination IP addresses, and the reasons for drops. Unlike traditional methods that often rely on vague counters or excessive data sampling, WJH provides actionable insights that help pinpoint issues quickly.
When should network administrators consider using streaming telemetry?
Network administrators should consider using streaming telemetry when they face challenges in diagnosing performance issues, such as packet loss or congestion. As networks grow larger and faster, having real-time visibility into network conditions becomes essential for maintaining optimal performance.
What are the steps to deploy WJH in a network?
To deploy WJH, network administrators should first enable it on a switch connected to the production network to perform a scan. Next, they should address the issues identified during the scan and finally customize WJH settings to suit their specific network management needs.

Technologies & Tools

Network Monitoring
Nvidia What Just Happened (wjh)
Used for advanced telemetry to monitor and diagnose network performance issues.

Key Actionable Insights

1
Implement NVIDIA What Just Happened (WJH) to gain immediate insights into network performance issues.
By enabling WJH on your switches, you can quickly identify and resolve problems, reducing downtime and improving overall network efficiency.
2
Shift from traditional protocol-based monitoring to advanced streaming telemetry for better diagnostics.
This transition allows for a more streamlined approach to network management, focusing on critical data that directly impacts performance rather than sifting through excessive logs.
3
Regularly review and adjust WJH settings to align with your network's evolving needs.
As network conditions change, customizing alert thresholds and logging preferences can help maintain optimal visibility and responsiveness to issues.

Common Pitfalls

1
Over-reliance on traditional monitoring tools that collect excessive data without providing actionable insights.
This can lead to confusion and longer resolution times, as administrators may struggle to identify the root cause of issues amidst a sea of irrelevant data.
2
Failing to customize WJH settings after initial deployment.
Neglecting to tailor alert settings and filters can result in missed critical notifications or overwhelming amounts of data that hinder effective monitoring.

Related Concepts

Network Performance Monitoring
Streaming Telemetry
Data Center Management
Packet Analysis