ClickHouse Release 24.11

The ClickHouse Team
7 min readintermediate
--
View Original

Overview

ClickHouse version 24.11 introduces significant enhancements including 9 new features, 15 performance optimizations, and 68 bug fixes. Key updates include the parallel hash join as the default strategy, the introduction of the STALENESS modifier for the WITH FILL clause, and the new BFloat16 data type for improved vector search performance.

What You'll Learn

1

How to utilize the STALENESS modifier in ClickHouse queries

2

Why parallel hash join improves performance in ClickHouse

3

How to pre-warm the mark cache for better query performance

4

When to use BFloat16 data type for vector searches

Key Questions Answered

What are the new features in ClickHouse version 24.11?
ClickHouse version 24.11 introduces 9 new features, including the parallel hash join as the default join strategy, the STALENESS modifier for the WITH FILL clause, and the BFloat16 data type for enhanced vector search capabilities.
How does the STALENESS modifier work in ClickHouse?
The STALENESS modifier allows the query to generate rows until the difference from the previous row exceeds a specified numeric expression, enhancing data handling in time-series queries.
What performance optimizations were made in ClickHouse 24.11?
This release includes 15 performance optimizations, notably improving the parallel hash join algorithm by implementing zero-copy processing for blocks scattered between threads, reducing memory overhead.
How can the mark cache be pre-warmed in ClickHouse?
Users can pre-warm the mark cache using the setting 'mark_cache_prewarm_ratio', which is set to 95% by default, and can execute the command 'SYSTEM PREWARM MARK CACHE' to load all marks into the cache immediately.

Key Statistics & Figures

New features
9
Total new features introduced in ClickHouse version 24.11
Performance optimizations
15
Total performance optimizations included in this release
Bug fixes
68
Total bug fixes addressed in ClickHouse version 24.11

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement the STALENESS modifier in your ClickHouse queries to optimize data retrieval for time-series data.
This allows for more efficient handling of gaps in data, ensuring that your queries return meaningful results without unnecessary zero-fill.
2
Utilize the new BFloat16 data type for machine learning applications to improve performance in vector searches.
BFloat16 offers a balance between precision and performance, making it ideal for AI/ML workloads that require fast processing of large datasets.
3
Take advantage of the parallel hash join as the default strategy to enhance query performance in ClickHouse.
This strategy allows for concurrent processing of data, significantly speeding up join operations, especially with large datasets.

Common Pitfalls

1
Failing to properly configure the STALENESS modifier can lead to unexpected query results.
Ensure that the STALENESS value is set appropriately to avoid missing important data points in time-series analysis.

Related Concepts

Time-series Data Handling
Vector Search Optimization
Query Performance Tuning