ClickHouse 22.11 Release

The ClickHouse team
6 min readintermediate
--
View Original

Overview

The ClickHouse 22.11 release introduces 15 new features, 5 performance optimizations, and 32 bug fixes, enhancing the capabilities of this open-source columnar database. Notable features include support for composite time intervals, Glob patterns, and improved data lake support, aimed at streamlining data management and analysis.

What You'll Learn

1

How to utilize retries on INSERT for large data migrations in ClickHouse

2

Why Glob patterns enhance data retrieval from local storage and S3 buckets

3

How to implement data lake support using Apache Hudi and Delta Lake

Prerequisites & Requirements

  • Understanding of ClickHouse and data migration concepts
  • Familiarity with AWS S3 and data storage solutions(optional)

Key Questions Answered

What new features are introduced in ClickHouse 22.11?
ClickHouse 22.11 introduces 15 new features including composite time intervals, support for Glob patterns, and functions for Spark compatibility. It also enhances data lake support through Apache Hudi and Delta Lake, along with performance improvements across various clients.
How does the new retry feature on INSERT work?
The new retry feature allows large INSERT operations to survive connection interruptions. By setting 'insert_keeper_max_retries', failed blocks during an INSERT will be retried, enabling smoother data migrations without needing to reset the entire operation.
What are Glob patterns and how do they enhance data queries?
Glob patterns allow recursive directory traversal when accessing data in ClickHouse. This feature enables users to perform ad-hoc analysis or selectively insert data from local storage or S3, significantly improving data management efficiency.
What performance improvements were made in ClickHouse 22.11?
The release includes 5 performance optimizations aimed at enhancing the efficiency of data handling and processing. Notably, there are substantial changes in the Python client and updates to both the Go and JavaScript clients.

Key Statistics & Figures

New features introduced
15
The ClickHouse 22.11 release includes a total of 15 new features aimed at enhancing functionality.
Performance optimizations
5
This release includes 5 specific performance optimizations to improve overall system efficiency.
Bug fixes
32
A total of 32 bug fixes were implemented to enhance stability and performance.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement the retry feature on INSERT operations to ensure data integrity during migrations.
This is particularly useful when dealing with large datasets where connection issues can lead to failed operations. By leveraging the 'insert_keeper_max_retries' setting, you can minimize disruptions and improve migration success rates.
2
Utilize Glob patterns for efficient data retrieval from S3 buckets.
By applying Glob patterns, you can streamline data queries and reduce the amount of data processed, which is crucial for performance optimization in large datasets.
3
Explore the new data lake support features to integrate with Apache Hudi and Delta Lake.
This integration allows for more flexible data management and querying capabilities, enabling users to leverage existing data lakes effectively.

Common Pitfalls

1
Failing to set the 'insert_keeper_max_retries' can lead to data migration failures.
Without this setting, large INSERT operations may fail due to transient connection issues, causing frustration and data loss during migrations.
2
Not utilizing Glob patterns effectively can result in inefficient data queries.
Users may overlook the benefits of Glob patterns, leading to unnecessary data processing and longer query times when accessing large datasets.

Related Concepts

Data Migration Strategies
Data Lake Architecture
Efficient Querying Techniques