JIT WireGuard

One of many odd decisions we’ve made at Fly.io is how we use WireGuard. It’s not just that we use it in many places where other shops would use HTTPS and REST APIs. We’ve gone a step beyond that: every time you run flyctl, our lovable, sprawling CLI,

Lillian Berry, star-ark.net, star-ark.net
9 min readadvanced
--
View Original

Overview

The article discusses Fly.io's innovative approach to enhancing WireGuard's performance and scalability by implementing Just-In-Time (JIT) peer configuration. It details the challenges faced with stale peers and the solutions developed to streamline the process of managing WireGuard connections.

What You'll Learn

1

How to implement JIT peer configuration for WireGuard

2

Why managing stale WireGuard peers is critical for performance

3

When to utilize SQLite for lightweight peer management

Key Questions Answered

How does Fly.io improve WireGuard's performance?
Fly.io enhances WireGuard's performance by implementing a JIT configuration system that allows gateways to pull peer configurations on demand, reducing the number of stale peers and improving connection speeds. This method eliminates the need to pre-load all peers into the kernel, thus streamlining the connection process.
What problems arise from stale WireGuard peers?
Stale WireGuard peers can lead to performance degradation, as they accumulate over time and slow down kernel operations. This can result in slow loading times for peers after a gateway reboot and may even cause kernel panics, making it essential to manage peer lifecycles effectively.
What are the benefits of using SQLite for WireGuard peer management?
SQLite allows Fly.io to efficiently store and manage WireGuard peer configurations without the overhead of a more complex database system. This lightweight solution is ideal for their gateway architecture, enabling quick access and modification of peer data as needed.
How does Fly.io handle incoming WireGuard connection requests?
Fly.io captures incoming WireGuard connection requests using a BPF filter and a packet socket, allowing them to dynamically create peer configurations based on actual connection attempts. This approach ensures that only necessary peers are added to the kernel, optimizing resource usage.

Key Statistics & Figures

Stale WireGuard peer count
Reduced from hundreds of thousands to nearly zero
This significant reduction in stale peers has led to faster gateway performance and quicker peer setup times.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Networking
Wireguard
Used for secure peer-to-peer connections in Fly.io's infrastructure.
Database
Sqlite
Employed for storing WireGuard peer configurations efficiently.
Messaging
Nats
Previously used for messaging between services, now scaled back due to reliability issues.
Networking
Bpf
Used to filter and capture WireGuard packets for dynamic peer management.

Key Actionable Insights

1
Implement JIT peer configuration to enhance WireGuard performance.
By allowing gateways to pull peer configurations on demand, you can significantly reduce the number of stale peers, leading to faster connection times and improved resource management.
2
Regularly clean up stale WireGuard peers to maintain system performance.
Stale peers can accumulate and slow down operations, especially after reboots. Implementing a cron job to remove unused peers can help maintain optimal performance.
3
Utilize SQLite for lightweight peer management in resource-constrained environments.
SQLite provides an efficient way to store and manage peer configurations without the complexity of a full RDBMS, making it suitable for small gateway servers.

Common Pitfalls

1
Failing to manage stale WireGuard peers can lead to performance issues.
If stale peers are not regularly cleaned up, they can slow down kernel operations and lead to kernel panics, especially during gateway reboots.
2
Over-reliance on NATS for messaging can result in message loss.
NATS does not guarantee message delivery, which can lead to unreliable API interactions if not managed properly.