Introducing OpenZL: An Open Source Format-Aware Compression Framework

OpenZL is a new open source data compression framework that offers lossless compression for structured data. OpenZL is designed to offer the performance of a format-specific compressor with the eas…

14 min readintermediate
--
View Original

Overview

OpenZL is a new open-source, format-aware compression framework designed for lossless compression of structured data. It combines the performance of specialized compressors with the simplicity of a single executable binary, allowing for efficient data handling across various formats.

What You'll Learn

1

How to use OpenZL for lossless compression of structured data

2

Why format-aware compression can outperform general-purpose compressors

3

How to implement automated compression plans with OpenZL

Prerequisites & Requirements

  • Understanding of data compression concepts
  • Familiarity with GitHub for accessing OpenZL repository(optional)

Key Questions Answered

How does OpenZL achieve better compression ratios compared to other compressors?
OpenZL achieves better compression ratios by applying a configurable sequence of transforms that reveal hidden order in the data. This allows it to optimize compression strategies specifically tailored to the structure of the data, unlike generic compressors that treat data as byte streams.
What are the performance metrics of OpenZL compared to Zstandard and xz?
In a comparison on an M1 CPU, OpenZL achieved a compressed size of 3,516,649 B, a compression ratio of x2.06, a compression speed of 340 MB/s, and a decompression speed of 1200 MB/s, outperforming both Zstandard and xz in compression ratio and speed.
When should OpenZL be used over traditional compression methods?
OpenZL should be used when dealing with structured data formats where specific compression strategies can exploit the inherent data patterns. If the data lacks structure, OpenZL defaults to Zstandard, which may not provide the same level of optimization.

Key Statistics & Figures

Compressed Size
3,516,649 B
Size achieved by OpenZL when compressing the 'sao' file from the Silesia Compression Corpus.
Compression Ratio
x2.06
Compression ratio achieved by OpenZL, indicating its effectiveness compared to other compressors.
Compression Speed
340 MB/s
Speed at which OpenZL compresses data, showcasing its performance capabilities.
Decompression Speed
1200 MB/s
Speed at which OpenZL decompresses data, highlighting its efficiency.

Technologies & Tools

Compression Framework
Openzl
Used for lossless compression of structured data.
Compression Algorithm
Zstandard
Fallback option used by OpenZL when the input format is not understood.

Key Actionable Insights

1
Utilize OpenZL for compressing structured datasets to maximize storage efficiency and speed.
OpenZL is particularly effective for timeseries data, ML tensors, and database tables, where understanding the data structure can lead to significant compression gains.
2
Leverage the offline trainer in OpenZL to automatically generate optimized compression configurations.
This feature allows users to adapt to evolving data structures without needing to manually adjust compression settings, ensuring ongoing efficiency.

Common Pitfalls

1
Assuming OpenZL will always outperform traditional compressors regardless of data structure.
OpenZL relies on the presence of structured data to leverage its compression techniques. In cases where data lacks structure, it defaults to Zstandard, which may not provide the expected performance benefits.

Related Concepts

Data Compression Techniques
Lossless Compression
Structured Data Handling
Compression Algorithms