Evolving MySQL Compression — Part 1

Pinterest Engineering
6 min readintermediate
--
View Original

Overview

The article discusses the evolution of MySQL compression techniques at Pinterest, particularly focusing on the challenges faced with InnoDB page compression and the development of a new column compression method. It highlights the importance of efficient data storage and the impact of compression on performance.

What You'll Learn

1

How to implement MySQL InnoDB page compression effectively

2

Why column-level compression can improve performance in MySQL

3

How to utilize predefined dictionaries for better compression ratios

Prerequisites & Requirements

  • Understanding of MySQL and its compression techniques
  • Familiarity with Percona MySQL Server(optional)

Key Questions Answered

What are the downsides of InnoDB page compression?
InnoDB page compression has several downsides, including inefficiencies in memory usage due to storing both compressed and uncompressed pages, limitations on compression savings when data exceeds 16KB pages, and increased latency under high concurrency workloads. These factors can hinder performance, especially in environments with many active connections.
How does column compression improve MySQL performance?
Column compression improves MySQL performance by allowing more efficient use of memory and achieving better compression ratios. It eliminates the need to store both compressed and uncompressed data in memory, which enhances RAM efficiency. Initial tests showed that this method doubled throughput at high concurrency compared to traditional methods.
What role does Zlib play in MySQL compression?
Zlib is the compression library used for both InnoDB page compression and the new column compression method developed by Pinterest. It employs the LZ77 algorithm to replace repeated strings with references, which is effective for page compression but requires adaptation for optimal column compression through the use of predefined dictionaries.

Key Statistics & Figures

Compression savings from column compression
~66%
Achieved through the use of a predefined dictionary in initial tests.
Throughput improvement at high concurrency
Doubled throughput
Observed during benchmarking of the new column compression method.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Mysql
Used for storing and managing Pinterest's data, with a focus on compression techniques.
Library
Zlib
Utilized for implementing compression algorithms in MySQL.
Database
Percona Mysql Server
The platform on which the new compression features were developed and tested.

Key Actionable Insights

1
Implementing column-level compression can significantly enhance MySQL performance, especially under high concurrency. By reducing memory usage and improving throughput, this method allows for more efficient data handling.
This approach is particularly useful for applications that require high performance and scalability, as it addresses the limitations of traditional page compression methods.
2
Utilizing predefined dictionaries in compression algorithms can lead to better space savings. By warming up the lookback window with common strings, you can achieve higher compression ratios.
This technique is beneficial when dealing with structured data formats like JSON, where field names can contribute significantly to overall size.

Common Pitfalls

1
Relying solely on InnoDB page compression can lead to inefficiencies, especially in memory usage and performance under high load.
This occurs because InnoDB stores both compressed and uncompressed pages in memory, which can waste resources. It's important to evaluate alternative compression methods that better suit specific workloads.

Related Concepts

Mysql Compression Techniques
Data Storage Optimization
Performance Tuning In Databases