Fly's Prometheus Metrics

We should talk a bit about metrics and measurement and stuff, because they’re how we all know what’s going on. There’s two reasons we’ve written this post. The first is just that we think this stuff is interesting, and that the world can always use

Thomas Ptacek
12 min readadvanced
--
View Original

Overview

The article discusses Fly.io's implementation of Prometheus metrics for monitoring applications running on their platform. It details how metrics are collected, stored, and utilized to provide insights into application performance and health.

What You'll Learn

1

How to collect and expose metrics using Prometheus in your applications

2

Why using metrics is more effective than traditional checks for monitoring systems

3

How to integrate your application's metrics with Fly.io's infrastructure

Prerequisites & Requirements

  • Understanding of metrics and monitoring concepts
  • Familiarity with Prometheus and its exporters(optional)

Key Questions Answered

How does Fly.io collect and expose application metrics?
Fly.io collects metrics through its Rust request router, fly-proxy, which counts various metrics such as incoming bytes, TLS handshake times, and more. These metrics are exposed via HTTP endpoints that can be scraped by Prometheus, allowing users to monitor their applications effectively.
What are the advantages of using metrics over checks for monitoring?
Metrics allow for more extensive tracking of application performance without the overhead of running scripts. They enable anomaly detection through historical data analysis, making it easier to identify issues and trends over time compared to traditional checks.
What technologies are used in Fly.io's metrics stack?
Fly.io's metrics stack includes Victoria Metrics for storage, Telegraf for collecting metrics, and Prometheus exporters for exposing application metrics. This combination allows for efficient metrics collection and management across their infrastructure.

Key Statistics & Figures

fly_proxy_service_egress_http_responses_count
1586
This metric indicates the number of HTTP responses sent by the fly-proxy service with a status of 200 for a specific application.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Monitoring
Prometheus
Used for collecting and querying metrics from applications.
Database
Victoria Metrics
Serves as the metrics database for storing collected metrics.
Data Collection
Telegraf
Collects and forwards metrics from various sources to Victoria Metrics.
Backend
Fly-proxy
Rust-based request router that handles incoming HTTP requests and collects metrics.

Key Actionable Insights

1
Implementing Prometheus metrics in your application can significantly enhance your monitoring capabilities.
By exposing metrics, you can leverage tools like Grafana for visualization and alerting, which can lead to quicker identification of performance issues.
2
Consider using a time-series database like Victoria Metrics for storing your application's metrics.
Victoria Metrics is optimized for handling high write loads and can efficiently manage the unique characteristics of time-series data, making it a suitable choice for applications with extensive metrics.
3
Utilize the metrics collected to create alerts based on historical trends.
This proactive approach allows you to detect anomalies before they impact users, improving overall system reliability.

Common Pitfalls

1
Failing to properly expose application metrics can lead to gaps in monitoring.
Without exposing metrics, you miss critical insights into application performance, making it difficult to troubleshoot issues effectively.
2
Overloading your metrics database with too many metrics can degrade performance.
It's important to focus on key metrics that provide actionable insights rather than collecting excessive data that may not be useful.

Related Concepts

Prometheus Metrics
Time-series Databases
Monitoring Best Practices