From OTel to Rotel: Petabyte-scale tracing with 4x greater throughput

Read how Streamfold’s Rotel pushes OpenTelemetry 4x faster into ClickHouse - benchmarking efficiency at scale and revealing the tools that make it happen

Overview

This article discusses the transition from OpenTelemetry (OTel) to Rotel, an open-source Rust project that enhances tracing capabilities at petabyte scale. It highlights significant performance improvements, achieving a throughput of 3.7 million trace spans per second, which is four times greater than the OTel Collector.

What You'll Learn

1

How to benchmark OpenTelemetry data planes for performance

2

Why optimizing resource usage is critical at petabyte scale

3

How to implement JSON binary serialization in Rust for performance gains

Prerequisites & Requirements

  • Understanding of OpenTelemetry and tracing concepts
  • Familiarity with ClickHouse and Kafka(optional)
  • Experience with Rust programming(optional)

Key Questions Answered

How does Rotel improve performance compared to the OpenTelemetry Collector?
Rotel achieves a throughput of 3.7 million trace spans per second, which is four times greater than the OpenTelemetry Collector's maximum of 1.1 million spans per second. This improvement is due to optimizations such as JSON binary serialization, enhanced task management, and better compression techniques.
What are the key optimizations implemented in Rotel?
Key optimizations in Rotel include JSON binary serialization, performance analysis of Tokio task management, and improved LZ4 compression. These enhancements allow Rotel to efficiently handle large volumes of trace data while minimizing resource usage.
What hardware setup was used for the benchmarks?
The benchmarks were conducted on AWS EC2 instances, specifically using m8i and i3 instance types. The load generator utilized an m8i.8xlarge instance with 32 vCPUs and 128 GiB of memory, while the ClickHouse instance was an i3.4xlarge with 16 vCPUs and 122 GiB of memory.
What metrics were used to evaluate the performance of Rotel?
The performance of Rotel was evaluated using metrics such as trace spans per second, memory usage, and CPU utilization. The tests recorded up to 3.7 million spans per second and monitored CPU usage to identify bottlenecks.

Key Statistics & Figures

Maximum throughput achieved
3.7 million trace spans/sec
This throughput was achieved with a single instance of Rotel, showcasing its efficiency compared to the OTel Collector.
Performance improvement over OTel Collector
4x
Rotel's throughput is four times greater than the OTel Collector's maximum of 1.1 million spans/sec.
CPU utilization during peak performance
93.7%
This was observed while processing 3.6 million trace spans/sec, indicating high resource efficiency.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implementing JSON binary serialization can significantly enhance data processing performance in Rust applications.
This technique reduces the overhead associated with JSON stringification, allowing for faster data transmission and processing, especially in high-throughput scenarios.
2
Optimizing resource usage at scale can lead to substantial cost savings and improved efficiency.
As demonstrated by Rotel's performance improvements, small changes in resource consumption can have a large impact on operational costs, making it crucial for organizations handling large data volumes.
3
Utilizing a multi-threaded approach for CPU-intensive tasks can improve throughput and reduce bottlenecks.
By offloading heavy tasks to separate threads, Rotel was able to maintain high performance without blocking the main execution flow, which is essential in asynchronous environments.

Common Pitfalls

1
Failing to optimize for memory allocation can lead to performance bottlenecks.
In the old version of Rotel, excessive memory allocation and deallocation caused significant CPU overhead, impacting overall throughput. It's crucial to manage memory efficiently, especially in high-performance applications.

Related Concepts

Opentelemetry
Tracing
Performance Optimization
Data Serialization Techniques