Reducing Logging Cost by Two Orders of Magnitude using CLP

Jack (Yu) Luo, Devesh Agrawal
22 min readadvanced
--
View Original

Overview

The article discusses how Uber reduced its logging costs significantly by integrating the Compressed Log Processor (CLP) into its logging architecture. It highlights the challenges faced due to the exponential growth of log data and how CLP achieves a 169x compression ratio, allowing for extensive log retention without incurring prohibitive costs.

What You'll Learn

1

How to integrate CLP into your logging architecture to achieve significant compression

2

Why using a custom floating-point encoding can enhance performance and compression ratio

3

How to implement Phase 1 of CLP for immediate logging cost reduction

Prerequisites & Requirements

  • Understanding of logging architectures and data compression techniques
  • Familiarity with Log4j and Spark

Key Questions Answered

How does CLP achieve a 169x compression ratio for logs?
CLP achieves a 169x compression ratio by parsing log messages into structured formats, deduplicating repetitive components, and compressing them using Zstandard in a column-oriented manner. This allows for significant storage savings while enabling efficient search capabilities on compressed logs without full decompression.
What are the benefits of using CLP over general-purpose compressors?
CLP is specifically designed for log data, allowing for higher compression ratios by exploiting the repetitive nature of logs. Unlike general-purpose compressors, CLP enables searching directly on compressed logs, which avoids the need for full decompression, thus saving time and resources.
What challenges did Uber face with log retention before implementing CLP?
Before implementing CLP, Uber faced challenges with high storage costs for log retention, as retaining logs for a month would cost millions annually. The rapid growth of log data also led to issues with SSD wear due to excessive write operations, prompting the need for a more efficient logging solution.
How does CLP's custom floating-point encoding improve performance?
CLP's custom floating-point encoding improves performance by speeding up the encoding process by 2-3x compared to IEEE-754, while also ensuring lossless representation of floating-point values. This optimization enhances overall compression efficiency and reduces the likelihood of data duplication in logs.

Key Statistics & Figures

Compression ratio achieved by CLP
169x
This ratio allows Uber to retain logs for a month at a significantly reduced cost.
Uncompressed log data generated in a 30-day window
5.38PB
This volume of logs was compressed to only 31.4TB using CLP.
Increased retention period for logs
10x
The retention period was increased from 3 days to 1 month after implementing CLP.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Compression Tool
Clp
Used to compress log data efficiently while allowing for search capabilities.
Logging Library
Log4j
Utilized for logging in Spark applications, integrated with CLP for enhanced compression.
Compression Algorithm
Zstandard
Employed as part of CLP to compress log data in a column-oriented manner.
Data Processing Framework
Spark
Uber's primary platform for running analytics jobs, generating large volumes of log data.

Key Actionable Insights

1
Implementing CLP can drastically reduce logging costs and improve data retention capabilities.
By achieving a 169x compression ratio, Uber was able to retain logs for a month at a fraction of the previous cost, allowing engineers to access critical log data for troubleshooting and analysis.
2
Customizing floating-point encoding can enhance both performance and compression ratios.
This approach not only speeds up the encoding process but also minimizes data duplication, making it easier to manage large volumes of log data efficiently.
3
Integrating CLP into existing logging frameworks like Log4j can provide immediate benefits.
The integration allows for lightweight, streaming compression, which is crucial for high-throughput environments like Uber's Spark platform.

Common Pitfalls

1
Relying solely on general-purpose compressors for log data can lead to suboptimal performance.
General-purpose compressors do not exploit the unique characteristics of log data, resulting in lower compression ratios and inefficient search capabilities.

Related Concepts

Data Compression Techniques
Log Management Strategies
Big Data Analytics Frameworks