Smaller and faster data compression with Zstandard

Visit the post for more.

Yann Collet
21 min readadvanced
--
View Original

Overview

The article discusses Zstandard, a new data compression algorithm developed by Facebook that offers significant improvements in both compression and decompression speeds compared to existing algorithms like zlib. It highlights Zstandard's scalability, efficiency, and applicability across various data types, making it a versatile choice for modern data compression needs.

What You'll Learn

1

How to implement Zstandard for data compression in applications

2

Why Zstandard is more efficient than traditional algorithms like zlib

3

When to choose different compression levels in Zstandard based on use case

Prerequisites & Requirements

  • Basic understanding of data compression concepts
  • Familiarity with command line tools for compression(optional)

Key Questions Answered

How does Zstandard improve data compression speed and efficiency?
Zstandard combines recent compression breakthroughs with a performance-first design, allowing it to achieve faster compression and decompression speeds compared to traditional algorithms like zlib. It offers a higher compression ratio while maintaining speed, making it suitable for a wide range of applications.
What are the scalability features of Zstandard?
Zstandard offers 22 compression levels, allowing users to make granular trade-offs between compression speed and ratio. This scalability enables it to adapt to various requirements, from prioritizing speed to maximizing compression size.
What is the role of Finite State Entropy in Zstandard?
Finite State Entropy is a next-generation probability compressor used in Zstandard that allows for efficient encoding of symbols using minimal bits. It achieves high compression ratios while maintaining low CPU resource usage, making it faster than traditional methods like Huffman coding.
How does Zstandard handle small data compression?
Zstandard is optimized for small data compression, such as JSON messages, by utilizing dictionary compression techniques. This allows it to achieve significant improvements in compression ratios, ranging from 2x to 5x better than without dictionaries.

Key Statistics & Figures

Decompression speed of Zstandard
approximately 550 MB/s
This speed is significantly faster than zlib, which achieves around 270 MB/s.
Compression speed improvement over zlib
~3-5x faster
At the same compression ratio, Zstandard compresses data much faster than zlib.
Compression ratio improvement over zlib
10-15 percent smaller
Zstandard achieves smaller file sizes compared to zlib at the same compression speed.

Technologies & Tools

Compression Algorithm
Zstandard
Used for efficient data compression and decompression in various applications.

Key Actionable Insights

1
Adopt Zstandard for applications requiring fast data compression and decompression.
Zstandard's ability to compress data significantly faster than zlib while maintaining a high compression ratio makes it ideal for applications where speed is critical, such as real-time data processing.
2
Utilize the various compression levels in Zstandard to optimize performance based on specific use cases.
By adjusting the compression level, developers can tailor the trade-off between speed and size according to the requirements of their application, ensuring efficient resource utilization.
3
Implement dictionary compression for small data sets to maximize compression efficiency.
Using pre-shared dictionaries can drastically improve compression ratios for small data types, making Zstandard particularly effective for web applications that frequently transmit small JSON messages.

Common Pitfalls

1
Overlooking the importance of choosing the right compression level can lead to suboptimal performance.
Selecting a compression level that does not align with the application's needs can result in either excessive resource usage or inadequate compression, impacting overall efficiency.

Related Concepts

Data Compression Techniques
Zlib Vs Zstandard
Finite State Entropy
Dictionary Compression