ClickHouse Release 24.7

The ClickHouse Team
13 min readbeginner
--
View Original

Overview

ClickHouse version 24.7 introduces significant enhancements, including 18 new features, 12 performance optimizations, and 76 bug fixes. Key improvements focus on data reading optimizations, faster parallel hash joins, and the introduction of the full sorting merge join for ASOF JOINs, enhancing query performance and memory efficiency.

What You'll Learn

1

How to optimize data reading in ClickHouse using the new buffering feature

2

Why the parallel hash join algorithm improves JOIN performance in ClickHouse

3

How to utilize the full sorting merge join for ASOF JOINs in ClickHouse

4

How to calculate percent ranks using window functions in ClickHouse

5

How to create and use automatic named tuples in ClickHouse

Key Questions Answered

What are the new features introduced in ClickHouse version 24.7?
ClickHouse version 24.7 introduces 18 new features, including optimizations for reading data in order, a faster parallel hash join algorithm, and support for ASOF JOINs using the full sorting merge join. These enhancements aim to improve query performance and reduce memory usage.
How does the new buffering feature affect query performance in ClickHouse?
The new buffering feature in ClickHouse can increase query performance by up to 10x when using the optimize_read_in_order optimization with a high-selectivity filter. It allows concurrent streaming of data into a buffer before merging, significantly reducing query execution time.
What improvements were made to the parallel hash join algorithm in ClickHouse 24.7?
In version 24.7, the parallel hash join algorithm has been optimized to cache the sizes of hash tables from previous executions. This allows for pre-allocation based on remembered sizes, speeding up subsequent JOIN operations and reducing memory overhead.
How does the full sorting merge join algorithm enhance ASOF JOINs in ClickHouse?
The full sorting merge join algorithm allows ASOF JOINs to benefit from the physical row order of tables, potentially skipping sorting and improving performance. This method can be more memory-efficient compared to traditional hash joins, especially for large datasets.

Key Statistics & Figures

Performance improvement with buffering
up to 10x
When using a high-selectivity filter with the optimize_read_in_order optimization.
Query execution time reduction
from 0.590 sec to 0.097 sec
When enabling the buffering feature for a specific query.
Memory usage increase with buffering
from 17.82 MiB to 48.37 MiB
This occurs while utilizing the new buffering feature during query execution.
Performance improvement with parallel hash join
10 times faster
Compared to the default hash join algorithm for large datasets.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement the new buffering feature in ClickHouse to enhance query performance, especially for large datasets with high-selectivity filters.
This can significantly reduce query execution times, making your data retrieval processes more efficient and responsive.
2
Utilize the full sorting merge join for ASOF JOINs to improve performance and reduce memory usage in time-series analytics.
This approach is particularly beneficial when dealing with large datasets where memory constraints are a concern.
3
Leverage the caching of hash table sizes in parallel hash joins to optimize repeated query performance.
This can lead to substantial time savings in environments where similar JOIN operations are frequently executed.

Common Pitfalls

1
Failing to utilize the new buffering feature can lead to suboptimal query performance.
Without enabling buffering, queries may take significantly longer to execute, especially with large datasets.

Related Concepts

Asof Joins
Window Functions
Data Optimization Techniques