Globally Distributed Postgres

This is a story about a cool hack we came up with at Fly. The hack lets you do something pretty ambitious with full-stack applications. What makes it cool is that it’s easy to get your head around, and involves just a couple moving parts, assembled i

Kurt Mackey, Kurt Mackey
12 min readadvanced
--
View Original

Overview

The article discusses a novel approach to deploying globally distributed PostgreSQL databases using Fly.io's infrastructure. It highlights the challenges of handling distributed writes and presents a simple hack to manage read and write requests effectively with minimal code.

What You'll Learn

1

How to deploy a globally replicated PostgreSQL database for both reads and writes

2

Why distributed writes are challenging in multi-region applications

3

How to implement the fly-replay feature to handle write requests efficiently

Prerequisites & Requirements

  • Understanding of CRUD applications and PostgreSQL
  • Familiarity with Docker and Fly.io(optional)

Key Questions Answered

How can I manage distributed writes in a PostgreSQL database?
Managing distributed writes in PostgreSQL can be achieved by using a simple hack that involves attempting to write to a read replica and catching the resulting exception. This allows the application to retry the request in the primary region, ensuring that writes are handled effectively without complex setups.
What is the fly-replay feature and how does it work?
The fly-replay feature allows applications to retry write requests in the primary region when a write to a read replica fails. By catching the read-only exception and adding a fly-replay header, the request is seamlessly redirected, ensuring efficient handling of write operations across distributed regions.
What are the performance implications of using read replicas?
Using read replicas can significantly improve read performance, but it introduces latency for write operations. For instance, a write to the primary database can take between 20-400ms depending on the region, while reads from replicas can be instantaneous, making it crucial to balance read and write operations effectively.
How does Fly.io handle load balancing for applications?
Fly.io uses a system that estimates the load on different regions and routes requests to the appropriate server. This is done through the fly-replay feature, which allows the system to shed latency by quickly trying different instances until a suitable one is found, ensuring efficient load distribution.

Key Statistics & Figures

Write latency to primary database
20-400ms
This latency occurs when write requests are made to the primary database from a different region.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement the fly-replay feature to enhance your application's ability to handle write requests across regions.
This feature allows your application to automatically retry failed write requests in the primary region, improving user experience and reducing latency for write operations.
2
Consider the read-heavy nature of your application when designing your database architecture.
Most applications experience a higher volume of read requests, so optimizing for reads can lead to significant performance improvements and better resource utilization.
3
Evaluate whether your application is write-heavy and explore databases designed for distributed writes, like CockroachDB.
If your application requires frequent writes across multiple regions, using a database that supports geographic partitioning can enhance performance and scalability.

Common Pitfalls

1
Assuming all GET requests are read operations can lead to unexpected write failures.
It's important to implement logic that can differentiate between read and write requests to avoid routing errors and ensure that the application behaves as expected.

Related Concepts

Distributed Databases
Load Balancing Techniques
Eventual Consistency In Distributed Systems