Solving Latency Spikes & Locking in a Distributed PostgreSQL Query

Scott Brisbane
7 min readadvanced
--
View Original

Overview

The article discusses how SafetyCulture addressed latency spikes and locking issues in a distributed PostgreSQL database using the Citus extension. It details the investigation into advisory locks and the solutions implemented to optimize query performance.

What You'll Learn

1

How to identify and resolve latency issues in distributed PostgreSQL queries

2

Why specifying a distribution column in queries is crucial for performance

3

How to avoid advisory locks in multi-shard updates

Prerequisites & Requirements

  • Understanding of PostgreSQL and distributed databases
  • Familiarity with PostgreSQL observability tools like pganalyze(optional)

Key Questions Answered

What causes latency spikes in distributed PostgreSQL queries?
Latency spikes were caused by advisory locks within PostgreSQL, which were identified through log entries. The Citus extension's handling of multi-shard updates led to these locks, even when the data being updated resided on a single shard.
How can query performance be improved in a distributed PostgreSQL setup?
Query performance can be improved by ensuring that the distribution column is specified in the WHERE clause of queries. This allows the Citus query planner to route the query directly to the relevant shard, avoiding unnecessary locking and latency.
What are advisory locks in PostgreSQL and how do they affect performance?
Advisory locks in PostgreSQL are user-defined locks that can lead to performance issues if not managed correctly. They are used by Citus to prevent distributed deadlocks during multi-shard operations, which can inadvertently cause latency spikes.

Key Statistics & Figures

Average query time reduction
From 48ms to 12ms
This improvement was achieved after optimizing the query structure to avoid advisory locks.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Postgresql
Used as the primary relational database for managing data.
Database Extension
Citus
Enables horizontal scalability and sharding for PostgreSQL databases.
Observability Tool
Pganalyze
Used for monitoring PostgreSQL performance and diagnosing query issues.

Key Actionable Insights

1
Always specify the distribution column in your queries when working with distributed tables in Citus.
This practice allows the query planner to optimize routing, reducing latency and avoiding unnecessary locking issues.
2
Utilize PostgreSQL observability tools to monitor and diagnose query performance issues.
Tools like pganalyze can provide insights into locking behavior and help identify performance bottlenecks in distributed environments.
3
Consider breaking complex queries into multiple simpler queries to avoid multi-shard updates.
While this may seem counterintuitive, it can significantly reduce locking and improve overall query performance in distributed databases.

Common Pitfalls

1
Failing to specify the distribution column in queries can lead to performance degradation.
This mistake results in the Citus query planner treating the operation as a multi-shard update, which incurs additional locking and latency.
2
Assuming that subqueries will optimize performance in a distributed setup.
In this case, the subquery was executed across all shards, still resulting in multi-shard updates and associated locking issues.

Related Concepts

Distributed Databases
Query Optimization Techniques
Postgresql Locking Mechanisms