Alexey's favorite features of 2025

Looking back at Alexey's favorite ClickHouse features of the year.

12 min readintermediate
--
View Original

Overview

The article highlights Alexey's favorite features introduced in ClickHouse throughout 2025, including lightweight updates, data lake support, and advancements in text and vector indexing. It provides insights into performance improvements and community contributions, showcasing the evolution of ClickHouse as a powerful analytical database.

What You'll Learn

1

How to implement lightweight updates in ClickHouse

2

Why data lake support is crucial for modern analytics

3

How to leverage text and vector indexing for improved query performance

4

When to use query condition cache for optimizing repeated queries

5

How to utilize join reordering for complex queries

Key Questions Answered

What are the key features introduced in ClickHouse 2025?
In 2025, ClickHouse introduced 277 new features, 319 performance optimizations, and 1,051 bug fixes. Key features include lightweight updates, enhanced data lake support, and advancements in text and vector indexing, significantly improving query performance and usability.
How do lightweight updates work in ClickHouse?
Lightweight updates in ClickHouse allow for standard SQL UPDATE statements at scale, introduced in version 25.7. They utilize a lightweight patch-part mechanism that applies only the changed data, ensuring minimal impact on query performance and allowing for efficient data management.
What improvements were made to data lake support in ClickHouse?
Throughout 2025, ClickHouse expanded its data lake capabilities, adding support for various catalog systems such as REST, Polaris, Unity, and Glue catalogs. This enhancement allows for better integration with open table formats like Iceberg and Delta Lake, facilitating more efficient data management.
What is the significance of the query condition cache in ClickHouse?
The query condition cache, introduced in ClickHouse 25.3, remembers which ranges of granules satisfy the WHERE clause, significantly improving query performance for repeated filters. In tests, it demonstrated over a 10x speedup in query execution time.

Key Statistics & Figures

New features introduced
277
Total new features added in ClickHouse throughout 2025.
Performance optimizations
319
Total performance optimizations implemented in ClickHouse in 2025.
Bug fixes
1,051
Total bug fixes addressed in ClickHouse during 2025.
Speedup from query condition cache
10x
Improvement in query execution time when using the query condition cache.
Performance improvement from join reordering
1,450 times faster
Speed improvement observed in a query joining six tables after implementing join reordering.
Memory usage reduction from join reordering
25 times less
Reduction in memory usage for a complex query after enabling join reordering.
Speedup from lazy materialization
1,576 times
Speed improvement for a query finding Amazon reviews with the highest helpful votes.
I/O reduction from lazy materialization
40 times less
Decrease in I/O operations when using lazy materialization for queries.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement lightweight updates to enhance data management efficiency.
By utilizing lightweight updates, you can minimize the performance impact of data modifications, allowing for more agile data handling in analytical workloads.
2
Leverage data lake support to integrate diverse data sources.
Using ClickHouse's enhanced data lake capabilities enables seamless querying of various data formats, improving data accessibility and analysis.
3
Utilize the query condition cache for repeated queries.
Enabling the query condition cache can drastically reduce execution times for frequently run queries, making it a valuable feature for dashboards and analytics.
4
Explore the new text and vector indexing capabilities for enhanced search performance.
These indexing features allow for faster and more efficient querying of large datasets, particularly beneficial for applications requiring full-text search or similarity searches.
5
Take advantage of join reordering to optimize complex queries.
Automatic global join reordering can significantly enhance query performance, especially in scenarios involving multiple tables and complex relationships.

Common Pitfalls

1
Failing to utilize the query condition cache can lead to significantly slower query performance.
Without enabling this feature, users may experience delays in executing repeated queries, especially in analytics scenarios where the same filters are applied multiple times.
2
Not leveraging join reordering can result in inefficient query execution plans.
Queries that involve multiple joins may perform poorly if the join order is not optimized, leading to excessive resource consumption and longer execution times.

Related Concepts

Data Lake Architecture
SQL Optimization Techniques
Full-text Search Strategies