Our container platform is in production. It has GPUs. Here’s an early look

Brendan Irvine-Broque

Cloudflare

•

Brendan Irvine-Broque

•21 min read•advanced•

--

•View Original

Cloudflare WorkersDockerGolangJavaScriptNGINXNode.jsRustSQLiteTerraformWebRTC

Overview

Cloudflare has developed a new container platform that leverages GPUs for running various applications across its global network. This article provides insights into the platform's architecture, its use in production, and the benefits of global scheduling for developers.

What You'll Learn

1

How to leverage Cloudflare's container platform for GPU workloads

2

Why global scheduling improves application performance across Cloudflare's network

3

How to optimize Docker image pulls using zstd compression

4

When to use prewarmed images for faster job execution

Prerequisites & Requirements

Understanding of containerization concepts
Familiarity with Docker and Cloudflare services(optional)

Key Questions Answered

How does Cloudflare's container platform utilize GPUs for AI workloads?

Cloudflare's container platform enables efficient scheduling of GPU workloads by dynamically placing AI models based on GPU memory needs and hardware availability. This allows for better resource utilization and faster inference times for applications like Workers AI.

What improvements were made to Docker image pulls in Cloudflare's platform?

Cloudflare replaced gzip with Zstandard (zstd) for compressing Docker images, which significantly reduced image pull times. This change allowed 30 GB GPU images to be pulled in 4 minutes instead of 8, enhancing deployment efficiency.

Why is global scheduling important for Cloudflare's container platform?

Global scheduling allows Cloudflare to manage workloads across its extensive network without requiring developers to specify regions or data centers. This flexibility enhances performance, reduces latency, and ensures efficient resource allocation.

How does Cloudflare ensure low latency for Remote Browser Isolation?

Cloudflare runs Remote Browser Isolation in containers located close to users, minimizing latency. This setup allows users to interact with browsers hosted in nearby data centers, providing a seamless experience.

Key Statistics & Figures

Image pull time for 30 GB GPU images

4 minutes

This is a significant improvement from the previous 8 minutes, achieved by optimizing the image compression method.

Number of Cloudflare locations

330+

Cloudflare operates in over 330 cities across 120+ countries, allowing for low-latency application deployment.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Containerization

Docker

Used for running applications in isolated environments.

Compression

Zstandard

Utilized for faster image compression and decompression in Docker image pulls.

Serverless Computing

Cloudflare Workers

Enables running serverless applications across Cloudflare's network.

Networking

Anycast

Facilitates routing requests to the nearest Cloudflare data center.

Key Actionable Insights

1
Utilize Cloudflare's container platform to run GPU-intensive applications efficiently.
By leveraging the platform's global scheduling capabilities, developers can enhance application performance and reduce latency without managing infrastructure.

2
Implement Zstandard compression for Docker images to improve deployment times.
Switching from gzip to zstd can significantly cut down the time required to pull large images, which is crucial for applications that require rapid scaling.

3
Prewarm images on servers to speed up off-peak job execution.
By caching the necessary images in advance, developers can ensure that new containers start quickly, reducing wait times for background tasks.

Common Pitfalls

1

Relying on regional scheduling can lead to inefficiencies and increased latency.

Developers may struggle with managing resources across multiple regions, which can complicate deployment and scaling. Instead, utilizing global scheduling allows for automatic workload optimization.

2

Neglecting to prewarm images can result in longer job execution times.

Without prewarming, the time to pull and unpack images can significantly delay the start of background tasks. Implementing prewarming strategies can mitigate this issue.

Related Concepts

Containerization

Global Scheduling

GPU Utilization

Serverless Architecture