Fly Behind The Scenes: Fresh Logging

Since Fly launched, we’ve been collecting and managing logs for all the applications running on the Fly platform. It’s a critical but often rarely noted function of the platform. When you type flyctl logs, behind the scenes, there is a lot of comput

Dj Walker-Morgan
7 min readadvanced
--
View Original

Overview

The article discusses the improvements made to the logging system on the Fly platform, highlighting the transition from a centralized Graylog server setup to a distributed logging architecture using Vector and Elasticsearch. This change has resulted in faster and more reliable log processing for users.

What You'll Learn

1

How to implement a distributed logging system using Vector and Elasticsearch

2

Why moving to Elasticsearch Common Schema improves log searching

3

How to configure Vector for log processing and transformation

Prerequisites & Requirements

  • Understanding of logging systems and log processing
  • Familiarity with Elasticsearch and Vector(optional)

Key Questions Answered

What improvements were made to the Fly logging system?
The Fly logging system transitioned from a centralized Graylog setup to a distributed architecture using Vector and Elasticsearch, allowing for faster log processing and reduced log loss. This change enables logs to be processed directly on each server and indexed in Elasticsearch, improving reliability and speed.
How does the new logging architecture benefit users?
The new architecture allows for faster log availability and processing, reducing the chances of dropped logs. Users can now retrieve logs more reliably, especially during deployment failures, enhancing the overall user experience on the Fly platform.
What is the volume of logs processed by the Fly platform?
The Fly platform processes between 20,000 and 30,000 logs per second. This high volume necessitated a more efficient logging solution than the previous centralized Graylog servers could provide.
What is the role of Vector in the new logging system?
Vector runs on each server, capturing, processing, and transforming logs before sending them directly to Elasticsearch. This eliminates the need for a middleman like Graylog, streamlining the logging process and improving performance.

Key Statistics & Figures

Log processing volume
20,000 to 30,000 logs per second
This volume highlights the need for a robust logging solution to handle peak loads without dropping logs.
Implementation duration
About two weeks
This timeframe includes handling various issues and configuring Elasticsearch, demonstrating the efficiency of the new system.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Logging Tool
Vector
Used for capturing, processing, and transforming logs on each server.
Search Engine
Elasticsearch
Used for indexing and retrieving logs efficiently.
Logging Tool
Graylog
Previous logging solution that has been replaced by the new architecture.

Key Actionable Insights

1
Implementing a distributed logging architecture can significantly enhance log processing speed and reliability.
By moving log processing to individual servers and utilizing tools like Vector and Elasticsearch, organizations can reduce latency and improve log availability.
2
Adopting a common schema for logs simplifies searching and querying across different applications.
Using the Elasticsearch Common Schema allows for consistent field naming, making it easier to search for specific log entries across multiple applications.
3
Utilizing configuration management systems for log processing configurations can streamline updates and changes.
With Vector, changes to log processing configurations can be deployed across all servers efficiently, allowing for quick adjustments to log handling without centralized bottlenecks.

Common Pitfalls

1
Relying on centralized logging systems can create bottlenecks and lead to dropped logs during high volume periods.
Centralized systems like Graylog may struggle to keep up with high log volumes, necessitating a shift to distributed architectures for better performance.

Related Concepts

Distributed Logging Systems
Log Processing And Transformation
Elasticsearch Common Schema
Configuration Management For Logging