ClickHouse vs. Elasticsearch: The Mechanics of Count Aggregations

Tom Schreiber
17 min readbeginner
--
View Original

Overview

This article compares ClickHouse and Elasticsearch, focusing on their mechanics for count aggregations. It highlights how ClickHouse outperforms Elasticsearch in terms of performance, cost efficiency, and scalability for large-scale data analytics and observability use cases.

What You'll Learn

1

How to leverage ClickHouse for efficient count aggregations

2

Why ClickHouse is more cost-effective than Elasticsearch for large datasets

3

When to use materialized views for continuous data summarization in ClickHouse

Key Questions Answered

How does ClickHouse achieve lower latencies for count aggregations compared to Elasticsearch?
ClickHouse utilizes parallelization techniques such as SIMD, multi-core, and multi-node processing, which allows it to efficiently aggregate data across multiple CPU cores and nodes. This results in at least 5 times lower latencies for count aggregations compared to Elasticsearch, which uses a less efficient single-threaded approach per shard.
What are the advantages of ClickHouse's materialized views over Elasticsearch's transforms?
ClickHouse's materialized views provide incremental data transformation without dependency on raw data, allowing for high scalability and real-time updates. In contrast, Elasticsearch's transforms require retaining old raw data and can lead to poor scalability and high computing costs.
What precision issues are associated with Elasticsearch's terms aggregation?
Elasticsearch's terms aggregation can yield approximate results when data is split across multiple shards, leading to potential inaccuracies in count values. Users can mitigate this by adjusting shard sizes, but this increases memory requirements and runtimes.

Key Statistics & Figures

Cost reduction in observability hardware
over 30%
Reported by Didi Tech after migrating from Elasticsearch to ClickHouse.
Improvement in average read latencies
100x
Reported by The Guild after switching to ClickHouse.
Lower latencies for count aggregations
at least 5 times
ClickHouse compared to Elasticsearch.
Cheaper hardware requirement
4 times cheaper
For comparable latencies to Elasticsearch.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Consider migrating from Elasticsearch to ClickHouse for large-scale data analytics to significantly reduce costs and improve performance.
Organizations like Didi Tech have reported over 30% cost savings in observability hardware after migrating to ClickHouse, making it a compelling choice for data-intensive applications.
2
Utilize ClickHouse's materialized views for real-time data summarization to enhance query performance.
Materialized views in ClickHouse allow for continuous aggregation, ensuring that data is always up-to-date without the need for complex re-aggregation processes.

Common Pitfalls

1
Relying on Elasticsearch's terms aggregation without understanding its precision limitations can lead to inaccurate results.
Users may assume that Elasticsearch provides exact counts, but due to its shard-based processing, results can be approximate unless additional configurations are applied.

Related Concepts

Data Analytics
Observability
Materialized Views
Incremental Aggregation