Building an Observability Solution with ClickHouse - Part 1 - Logs

Overview

This article discusses building an observability solution using ClickHouse, focusing specifically on log data collection and querying. It covers various agents such as OpenTelemetry, Vector, and Fluent Bit, detailing their architectures, deployment strategies, and integration with ClickHouse.

What You'll Learn

1

How to collect logs from a Kubernetes cluster using ClickHouse

2

Why using asynchronous inserts can optimize log data ingestion in ClickHouse

3

How to structure log data schemas for efficient querying in ClickHouse

Prerequisites & Requirements

  • Basic understanding of Kubernetes and log management
  • Familiarity with ClickHouse and its querying capabilities(optional)

Key Questions Answered

What are the best agents for collecting logs in ClickHouse?
The article discusses four principal agents for log collection in ClickHouse: OpenTelemetry Collector, Vector, Fluent Bit, and Fluentd. Each agent has its strengths, with OpenTelemetry being more versatile and Vector being feature-rich, while Fluent Bit is lightweight and efficient for Kubernetes environments.
How does ClickHouse handle log data compression?
ClickHouse offers significant compression rates for log data, ranging from 14x to 30x depending on the aggregator used. This is due to its column-oriented design and configurable codecs, which optimize storage and retrieval of log data.
How can logs be queried effectively in ClickHouse?
Logs in ClickHouse can be queried using various time-series functions, allowing for aggregation over time and filtering by pod names or error codes. The article provides examples of common queries, such as logs over time by pod name and logs within specific time windows.

Key Statistics & Figures

Daily log generation
100GB
This volume was produced from approximately 20 nodes in a development cloud environment.
Compression rate for Fluent logs
33.04
This indicates the ratio of uncompressed to compressed size for logs collected using Fluent Bit.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Clickhouse
Used for storing and querying observability data, particularly logs.
Observability
Opentelemetry
Provides a standardized way to collect and export observability data.
Data Pipeline
Vector
An open-source tool for collecting, transforming, and routing observability data.
Logs Processor
Fluent Bit
A lightweight log processor and forwarder used for collecting logs in Kubernetes.

Key Actionable Insights

1
Utilize asynchronous inserts when configuring your log ingestion pipeline to ClickHouse to improve performance and reduce the risk of data loss.
Asynchronous inserts allow ClickHouse to buffer incoming log data, which is crucial for handling high throughput environments, especially when using agents like Fluent Bit that may generate many small inserts.
2
Consider the architecture of your observability stack carefully, especially the roles of agents and aggregators, to optimize data collection and processing.
Choosing the right architecture can minimize load on critical services and ensure that data is processed efficiently, which is essential for maintaining high availability and performance in production environments.

Common Pitfalls

1
Failing to configure agents for asynchronous inserts can lead to performance issues and data loss.
If agents are configured to flush data too frequently without batching, it can result in many small inserts, which ClickHouse struggles to handle efficiently.
2
Not pre-creating tables with the correct schema can cause issues with data ingestion.
Agents like Fluent Bit and Vector require users to define their schemas beforehand, and failing to do so can lead to mismatches and inefficient queries.

Related Concepts

Observability Architectures
Log Data Management
Data Ingestion Strategies
Performance Optimization Techniques