Smart alerts in ThirdEye, LinkedIn’s real-time monitoring platform

Xiaohui Sun
10 min readintermediate
--
View Original

Overview

The article discusses the implementation of Smart Alerts in ThirdEye, LinkedIn's real-time monitoring platform, focusing on the challenges of setting up effective alerts for diverse metrics. It details the redesigned alert pipeline, emphasizing anomaly detection and notification flows to enhance user experience and reduce noise in alerts.

What You'll Learn

1

How to configure anomaly detection rules in ThirdEye

2

Why customizing notification channels is essential for effective alerting

3

How to filter out irrelevant data to improve alert precision

Prerequisites & Requirements

  • Understanding of anomaly detection concepts
  • Familiarity with YAML configuration(optional)

Key Questions Answered

How does ThirdEye handle anomaly detection for diverse metrics?
ThirdEye employs a flexible anomaly detection system that can adapt to various metrics with different time granularities and patterns. It uses a combination of filtering, dynamic exploration, and multiple detection algorithms to ensure that only relevant anomalies are detected, thus improving alert accuracy.
What are the components of the Smart Alert pipeline in ThirdEye?
The Smart Alert pipeline consists of two main components: the anomaly detection flow, which identifies relevant anomalies from time series data, and the notification flow, which sends alerts to users based on customizable configurations, ensuring timely and relevant notifications.
What strategies are used to reduce noise in alerting?
To reduce noise, ThirdEye merges short-duration anomalies into single alerts and allows users to set thresholds for alert sensitivity. This helps avoid over-alerting and ensures that users receive only significant notifications.

Technologies & Tools

Monitoring Platform
Thirdeye
Used for real-time monitoring and anomaly detection across various metrics.
Configuration Language
YAML
Utilized for configuring alert settings and anomaly detection rules.
Database
Pinot
Serves as a real-time OLAP datastore for ThirdEye.
Database
Ingraphs
Acts as a time series database for anomaly detection.

Key Actionable Insights

1
Implement a dynamic filtering system to enhance anomaly detection accuracy.
By using dynamic filters that adapt to changing data patterns, you can ensure that only the most relevant metrics are monitored, reducing noise and improving the quality of alerts.
2
Customize notification settings based on user roles and alert types.
Tailoring notifications to specific teams or individuals based on the type of anomaly detected ensures that the right people are informed at the right time, improving response efficiency.
3
Utilize the preview feature to test alert configurations before full deployment.
This allows users to validate the effectiveness of their anomaly detection settings in real-time, ensuring that the configurations meet their monitoring needs before going live.

Common Pitfalls

1
Failing to customize alert thresholds can lead to over-alerting.
Without proper customization, users may receive excessive notifications for minor fluctuations, which can desensitize them to important alerts.
2
Neglecting to filter out irrelevant data can skew anomaly detection results.
Monitoring irrelevant metrics can lead to false positives, making it difficult to identify genuine issues and wasting valuable time.

Related Concepts

Anomaly Detection Techniques
Real-time Monitoring Strategies
Alert Configuration Best Practices