ClickHouse Release 25.2

The ClickHouse Team
10 min readintermediate
--
View Original

Overview

ClickHouse version 25.2 introduces 12 new features, 15 performance optimizations, and 72 bug fixes, enhancing its capabilities significantly. Key improvements include faster parallel hash joins, Bloom filter support for Parquet files, transitive condition inference, and a new backup database engine.

What You'll Learn

1

How to improve join performance in ClickHouse using parallel hash joins

2

Why Bloom filters enhance query performance when writing Parquet files

3

How to utilize transitive conditions for better query optimization

4

How to implement a backup database engine in ClickHouse

Key Questions Answered

What are the new features introduced in ClickHouse version 25.2?
ClickHouse version 25.2 introduces 12 new features including improved parallel hash join performance, Bloom filter writing for Parquet files, transitive condition inference for queries, and a backup database engine. These enhancements aim to optimize performance and usability.
How does the new Bloom filter feature improve Parquet file performance?
The Bloom filter feature allows ClickHouse to filter out row groups in Parquet files that do not match query conditions, leading to faster query execution. In tests, queries using Bloom filters showed a 30-40% speed improvement compared to those without.
What optimizations were made to parallel hash joins in ClickHouse 25.2?
In version 25.2, optimizations were made to the build phase of parallel hash joins, eliminating unnecessary CPU thread contention. This change allowed for better utilization of CPU resources, resulting in a query execution time reduction from 12.275 seconds in 25.1 to 6.345 seconds in 25.2.
What is the purpose of the Backup database engine in ClickHouse?
The Backup database engine in ClickHouse allows users to attach tables or databases from backups in read-only mode. This feature enhances data recovery and management capabilities, making it easier to work with backups directly within ClickHouse.

Key Statistics & Figures

Query execution time reduction
From 12.275 seconds in ClickHouse 25.1 to 6.345 seconds in ClickHouse 25.2
This improvement was observed during tests of the parallel hash join performance.
Speed improvement with Bloom filters
30-40% faster query execution
This improvement was noted when querying Parquet files with Bloom filters compared to those without.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement the new Bloom filter feature when exporting data to Parquet files to enhance query performance.
Using Bloom filters can significantly speed up queries by reducing the amount of data scanned, especially when filtering on columns with low cardinality.
2
Take advantage of the transitive conditions inference feature to optimize complex queries.
This feature can automatically deduce additional conditions, leading to more efficient query execution and reduced resource consumption.
3
Utilize the Backup database engine to streamline data recovery processes.
This allows for quick access to backup data without needing to restore it fully, improving operational efficiency.

Common Pitfalls

1
Failing to configure hash table size limits can lead to excessive memory usage during joins.
If limits are not set, ClickHouse may allocate too much memory, potentially causing performance degradation. Users should configure `max_rows_in_join` and `max_bytes_in_join` to manage resource usage effectively.

Related Concepts

Data Optimization Techniques
Backup And Recovery Strategies
Performance Tuning In Clickhouse