Every request, every microsecond: scalable machine learning at Cloudflare

Overview

This article discusses Cloudflare's advancements in scalable machine learning, focusing on the technical strategies that have improved processing times for HTTP requests while enhancing security through machine learning detections. Key innovations include the use of CatBoost for model inference, the development of the Gagarin feature serving platform, and the introduction of memory-mapped files for efficient data access.

What You'll Learn

1

How to optimize machine learning feature extraction using memory-mapped files

2

Why wait-free synchronization improves concurrent data access

3

How to implement zero-copy deserialization for performance gains

Prerequisites & Requirements

  • Understanding of machine learning concepts and data processing
  • Familiarity with Rust programming language and its ecosystem(optional)

Key Questions Answered

What are the main challenges faced in serving machine learning features?
The main challenges include high tail latency during peak times, suboptimal resource utilization, decreased availability of machine learning features due to memcached timeouts, and scalability constraints as more features were added. These issues necessitated a redesign of the system to improve performance and efficiency.
How does the mmap-sync crate enhance data access in Cloudflare's system?
The mmap-sync crate leverages memory-mapped files for efficient data access, enabling wait-free synchronization and zero-copy deserialization. This allows for high-performance concurrent data access between processes, significantly reducing latency and CPU contention compared to previous methods.
What performance improvements were achieved after the system redesign?
The redesign resulted in an average processing latency improvement of 12.5% for HTTP requests, with the Bot Management module seeing a 55.93% reduction in latency. This translates to a saving of 523 microseconds per request, equating to over 24,000 days of processing time saved daily.

Key Statistics & Figures

Average processing latency improvement
12.5%
Compared to the previous system
Bot Management module latency improvement
55.93%
Specifically in the Bot Management module
Median latency for serving machine learning features
9 microseconds
After the redesign, down from 532 microseconds

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Machine Learning
Catboost
Used for ultra low-latency model inference
Backend
Gagarin
Feature serving platform developed in Go
Library
Mmap-sync
Rust crate for managing high-performance concurrent data access
Programming Language
Rust
Used for developing the mmap-sync crate and the bliss library

Key Actionable Insights

1
Implementing memory-mapped files can drastically reduce latency in data-intensive applications.
This approach is particularly beneficial in scenarios where high throughput and low latency are critical, such as in machine learning inference systems.
2
Adopting wait-free synchronization techniques can enhance performance in multi-threaded environments.
This is essential for applications that require high concurrency and low contention, ensuring that all threads can progress without being blocked.
3
Utilizing zero-copy deserialization can significantly improve data access times.
This technique is useful in systems where data is frequently accessed and modified, as it minimizes the overhead associated with traditional deserialization methods.

Common Pitfalls

1
Over-reliance on traditional locking mechanisms can lead to performance bottlenecks.
This often occurs in high-concurrency environments where threads compete for access to shared resources, leading to contention and increased latency.
2
Neglecting the importance of efficient data serialization can hinder application performance.
Inefficient serialization methods can introduce significant overhead, particularly in systems that require rapid data access and processing.

Related Concepts

Memory-mapped Files
Wait-free Synchronization
Zero-copy Deserialization
Machine Learning Feature Extraction