Evolving our machine learning to stop mobile bots

Arushi Shah
9 min readadvanced
--
View Original

Overview

The article discusses the evolution of Cloudflare's machine learning models aimed at detecting mobile bots, highlighting the shift from desktop to mobile traffic and the challenges faced in accurately identifying legitimate mobile app traffic. It details the processes of data gathering, model training, evaluation, and deployment, showcasing improvements in performance and accuracy.

What You'll Learn

1

How to gather and prepare data for training machine learning models

2

Why understanding traffic patterns is crucial for bot detection

3

How to evaluate and deploy machine learning models effectively

4

When to use shadow mode for model validation

Prerequisites & Requirements

  • Basic understanding of machine learning concepts
  • Familiarity with the Catboost library(optional)

Key Questions Answered

How did Cloudflare evolve its machine learning models for mobile bot detection?
Cloudflare evolved its machine learning models by incorporating feedback from early customers and adapting to changing bot behaviors. They launched five additional models trained on metadata from traffic patterns, specifically focusing on mobile traffic which now constitutes over 54% of their network traffic.
What techniques did Cloudflare use to improve mobile app traffic detection?
Cloudflare identified legitimate mobile app traffic by analyzing open-source code and collaborating with customers to recognize specific traffic patterns. This led to the creation of new datasets that significantly improved the model's performance on mobile app traffic.
What were the results of the new machine learning model for mobile traffic?
The latest model achieved a false positive rate of 0.0% for Android traffic in one case and reduced false positives for a Web3 platform from 28.7%-40.7% to nearly 0.0%. This demonstrates the effectiveness of training on validated data.
How does Cloudflare ensure the accuracy of its machine learning models?
Cloudflare employs offline monitoring to validate model predictions against production models using validation datasets. They also utilize the SHAP Explainer to analyze model predictions and identify areas for improvement before deployment.

Key Statistics & Figures

Percentage of mobile traffic on Cloudflare's network
54%
This statistic highlights the significant shift in traffic patterns that necessitated the evolution of Cloudflare's bot detection strategies.
False positive rate for Android traffic
0.0%
Achieved by the latest model, demonstrating the effectiveness of training on trusted data.
Previous false positive rates for a Web3 platform
28.7% to 40.7%
This range was significantly reduced to nearly 0.0% with the new model, showcasing improvements in mobile app traffic detection.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Machine Learning Library
Catboost
Used for training binary classification models in the bot detection process.
Workflow Management
Airflow
Supports the internal pipeline for smooth model training processes.

Key Actionable Insights

1
Incorporate diverse datasets when training machine learning models to improve accuracy.
Using a variety of data sources helps ensure that the model can generalize well across different traffic types, especially for mobile applications where traditional datasets may be lacking.
2
Utilize shadow mode for real-time model validation without impacting customer traffic.
This approach allows for safe testing of new models, enabling developers to assess performance and make adjustments based on live data before full deployment.
3
Regularly update your machine learning models to adapt to evolving traffic patterns.
As bot behavior changes, it's crucial to refine models continuously to maintain detection accuracy and minimize false positives, particularly in environments with significant mobile traffic.

Common Pitfalls

1
Relying solely on traditional datasets for training machine learning models can lead to poor performance.
Many traditional datasets do not adequately represent mobile app traffic, which can result in models that fail to accurately detect legitimate requests from mobile applications.

Related Concepts

Machine Learning Model Training
Bot Detection Techniques
Traffic Pattern Analysis
Mobile Application Security