How to use Rust in ClickHouse: avoiding a full rewrite

Alexey Milovidov
14 min readintermediate
--
View Original

Overview

The article discusses ClickHouse's journey of integrating Rust into its predominantly C++ codebase without undertaking a complete rewrite. It highlights the benefits and challenges of this iterative approach, showcasing specific examples of Rust's application within ClickHouse.

What You'll Learn

1

How to integrate Rust libraries into a C++ codebase using CMake

2

Why iterative development is preferred over a full rewrite when adopting new technologies

3

When to use Rust for specific functionalities in a predominantly C++ application

Prerequisites & Requirements

  • Familiarity with C++ and Rust programming languages
  • Experience with CMake build system(optional)

Key Questions Answered

What are the advantages of using Rust in ClickHouse?
Rust offers memory and thread safety, modern libraries, and the ability to attract developers who prefer Rust. These advantages can enhance the reliability and maintainability of ClickHouse without a complete rewrite.
How did ClickHouse integrate Rust without rewriting its codebase?
ClickHouse adopted an iterative development approach, first integrating Rust into its CMake build system and testing it with non-essential libraries like BLAKE3, rather than rewriting the entire application.
What challenges arise from combining Rust and C++ in ClickHouse?
Challenges include ensuring reproducible builds, writing error-prone wrappers, and managing dependencies that may conflict with existing C++ libraries. These issues require careful handling to maintain stability.
What is the impact of Rust's panic mechanism on ClickHouse?
Rust's panic mechanism can lead to program crashes, which are memory safe but can disrupt server operations. ClickHouse's CI system helps identify these issues early, ensuring stability.

Key Statistics & Figures

Lines of code in ClickHouse
1.5 million
This number illustrates the scale of ClickHouse's C++ codebase, which is relatively smaller compared to other database management systems.
Number of tests run daily in ClickHouse's CI system
tens of millions
This extensive testing helps manage the complexities and potential bugs associated with C++ development.
Number of Rust dependencies added to ClickHouse
almost 700
This increase in dependencies reflects Rust's modularity and composability advantages, although it also adds complexity.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Adopt an iterative approach when integrating new programming languages into existing codebases to minimize risk and disruption.
This method allows teams to gradually test and implement new features without the overhead of a complete rewrite, making it easier to manage existing code.
2
Utilize continuous integration systems to catch errors early when mixing languages like Rust and C++.
CI systems can help identify segmentation faults and crashes quickly, allowing for faster debugging and more stable releases.
3
Consider the community and ecosystem around a programming language when deciding to integrate it into your projects.
Rust's growing popularity and the availability of modern libraries can provide significant advantages, but it's important to weigh these against the potential challenges of integration.

Common Pitfalls

1
Mixing Rust and C++ can lead to complex dependency management issues, especially when libraries have conflicting requirements.
This often occurs when Rust libraries depend on system libraries that may not align with the versions used in the existing C++ codebase, complicating builds and increasing maintenance overhead.

Related Concepts

Iterative Development
Continuous Integration
Memory Safety In Programming Languages
Dependency Management In Software Projects