Fly Behind The Scenes: Fresh Logging

Dj Walker-Morgan

Since Fly launched, we’ve been collecting and managing logs for all the applications running on the Fly platform. It’s a critical but often rarely noted function of the platform. When you type flyctl logs, behind the scenes, there is a lot of comput

Fly.io

•

Dj Walker-Morgan

•7 min read•advanced•

--

•View Original

ElasticsearchGraphQLJavaRust

Overview

The article discusses the improvements made to the logging system on the Fly platform, highlighting the transition from a centralized Graylog server setup to a distributed logging architecture using Vector and Elasticsearch. This change has resulted in faster and more reliable log processing for users.

What You'll Learn

1

How to implement a distributed logging system using Vector and Elasticsearch

2

Why moving to Elasticsearch Common Schema improves log searching

3

How to configure Vector for log processing and transformation

Prerequisites & Requirements

Understanding of logging systems and log processing
Familiarity with Elasticsearch and Vector(optional)

Key Questions Answered

What improvements were made to the Fly logging system?

The Fly logging system transitioned from a centralized Graylog setup to a distributed architecture using Vector and Elasticsearch, allowing for faster log processing and reduced log loss. This change enables logs to be processed directly on each server and indexed in Elasticsearch, improving reliability and speed.

How does the new logging architecture benefit users?

The new architecture allows for faster log availability and processing, reducing the chances of dropped logs. Users can now retrieve logs more reliably, especially during deployment failures, enhancing the overall user experience on the Fly platform.

What is the volume of logs processed by the Fly platform?

The Fly platform processes between 20,000 and 30,000 logs per second. This high volume necessitated a more efficient logging solution than the previous centralized Graylog servers could provide.

What is the role of Vector in the new logging system?

Vector runs on each server, capturing, processing, and transforming logs before sending them directly to Elasticsearch. This eliminates the need for a middleman like Graylog, streamlining the logging process and improving performance.

Key Statistics & Figures

Log processing volume

20,000 to 30,000 logs per second

This volume highlights the need for a robust logging solution to handle peak loads without dropping logs.

Implementation duration

About two weeks

This timeframe includes handling various issues and configuring Elasticsearch, demonstrating the efficiency of the new system.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Logging Tool

Vector

Used for capturing, processing, and transforming logs on each server.

Search Engine

Elasticsearch

Used for indexing and retrieving logs efficiently.

Logging Tool

Graylog

Previous logging solution that has been replaced by the new architecture.

Key Actionable Insights

1
Implementing a distributed logging architecture can significantly enhance log processing speed and reliability.
By moving log processing to individual servers and utilizing tools like Vector and Elasticsearch, organizations can reduce latency and improve log availability.

2
Adopting a common schema for logs simplifies searching and querying across different applications.
Using the Elasticsearch Common Schema allows for consistent field naming, making it easier to search for specific log entries across multiple applications.

3
Utilizing configuration management systems for log processing configurations can streamline updates and changes.
With Vector, changes to log processing configurations can be deployed across all servers efficiently, allowing for quick adjustments to log handling without centralized bottlenecks.

Common Pitfalls

1

Relying on centralized logging systems can create bottlenecks and lead to dropped logs during high volume periods.

Centralized systems like Graylog may struggle to keep up with high log volumes, necessitating a shift to distributed architectures for better performance.

Related Concepts

Distributed Logging Systems

Log Processing And Transformation

Elasticsearch Common Schema

Configuration Management For Logging

Making an Impact When interns join Shopify for their internship term, they work on projects that will impact our merchants, partners, and even their fellow developers. Some of these projects will alleviate a merchant's pain points, like the ability to sell their products on different channels, or simplify a complicated process for our developers. We want interns to leave knowing they worked on real projects with real impact.

JavaSwiftRuby

6 min read

Has Summary

--

Intermediate

Soundwave: an open source configuration management database

AWSJavaElasticsearch

4 min read

Has Summary

--

These articles from NVIDIA and other leading engineering teams share similar topics with "Fly Behind The Scenes: Fresh Logging". Explore more engineering insights on Java, Rust, Swift.