ClickHouse Release 25.8

Overview

ClickHouse version 25.8 introduces 45 new features, 47 performance optimizations, and 119 bug fixes, enhancing its capabilities as a high-performance analytical database. Key improvements include a new native Parquet reader, support for Hive-style partitioning, and initial PromQL support, making it a robust choice for data lake applications.

What You'll Learn

1

How to utilize the new native Parquet reader in ClickHouse for improved performance

2

How to implement Hive-style partitioning in ClickHouse for data organization

3

How to leverage PromQL support in ClickHouse for time-series data analysis

Key Questions Answered

What are the key features introduced in ClickHouse version 25.8?
ClickHouse version 25.8 introduces 45 new features, including a new native Parquet reader, support for Hive-style partitioning, and initial PromQL support. Additionally, it includes 47 performance optimizations and 119 bug fixes, enhancing overall performance and usability.
How does the new Parquet reader improve performance in ClickHouse?
The new Parquet reader improves performance by reading Parquet files directly into ClickHouse's in-memory format, enhancing parallelism and I/O efficiency. This results in an average performance increase of 1.81 times for ClickBench queries compared to the previous reader.
What is Hive-style partitioning and how is it implemented in ClickHouse?
Hive-style partitioning in ClickHouse allows data to be organized into directories based on partition keys. This structure improves data management and query performance, as demonstrated by creating a table partitioned by pickup and drop-off locations in the S3 table engine.
What improvements were made to Data Lake support in ClickHouse 25.8?
ClickHouse 25.8 enhances Data Lake support by allowing the creation of Apache Iceberg tables, enabling data insertion, deletion, updates, and schema adjustments. This makes ClickHouse a more versatile option for managing large datasets in data lake architectures.

Key Statistics & Figures

Performance improvement factor of new Parquet reader
1.81×
This factor represents the average speed increase for ClickBench queries using the new reader compared to the previous version.
Number of new features in ClickHouse 25.8
45
These features include enhancements to performance and usability for analytical workloads.
Number of bug fixes in ClickHouse 25.8
119
These fixes contribute to the stability and reliability of the ClickHouse platform.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Clickhouse
An analytical database optimized for high-performance querying of large datasets.
Data Format
Apache Parquet
A columnar storage file format optimized for use with data processing frameworks.
Data Lake
Apache Iceberg
An open table format for large analytic datasets.
Query Language
Promql
A query language for time-series data used in conjunction with ClickHouse.
Storage
S3
A cloud storage service used for data storage and retrieval in ClickHouse.
Data Exchange Protocol
Arrow Flight
A high-performance protocol for data exchange built on Apache Arrow.

Key Actionable Insights

1
Utilize the new native Parquet reader to enhance query performance when working with large datasets.
By implementing the native Parquet reader, users can significantly reduce query execution time, making ClickHouse a more efficient tool for analytics, especially in data lake environments.
2
Adopt Hive-style partitioning for better data organization and query performance.
Implementing Hive-style partitioning can streamline data management and improve query efficiency, particularly for large datasets stored in cloud environments.
3
Explore the initial PromQL support for time-series data analysis in ClickHouse.
This feature allows users to leverage familiar Prometheus query capabilities, making ClickHouse a competitive option for time-series data analytics.

Common Pitfalls

1
Failing to optimize queries when using the new Parquet reader can lead to suboptimal performance.
Users should ensure they leverage the new filtering capabilities and parallel processing features to fully benefit from the performance improvements.
2
Not understanding the implications of Hive-style partitioning can result in inefficient data organization.
It's important to carefully plan partition keys to align with query patterns to maximize performance benefits.

Related Concepts

Data Lake Architectures
Columnar Storage Formats
Performance Optimization Techniques In Databases