Overview
The article discusses how SafetyCulture addressed latency spikes and locking issues in a distributed PostgreSQL database using the Citus extension. It details the investigation into advisory locks and the solutions implemented to optimize query performance.
What You'll Learn
1
How to identify and resolve latency issues in distributed PostgreSQL queries
2
Why specifying a distribution column in queries is crucial for performance
3
How to avoid advisory locks in multi-shard updates
Prerequisites & Requirements
- Understanding of PostgreSQL and distributed databases
- Familiarity with PostgreSQL observability tools like pganalyze(optional)
Key Questions Answered
What causes latency spikes in distributed PostgreSQL queries?
Latency spikes were caused by advisory locks within PostgreSQL, which were identified through log entries. The Citus extension's handling of multi-shard updates led to these locks, even when the data being updated resided on a single shard.
How can query performance be improved in a distributed PostgreSQL setup?
Query performance can be improved by ensuring that the distribution column is specified in the WHERE clause of queries. This allows the Citus query planner to route the query directly to the relevant shard, avoiding unnecessary locking and latency.
What are advisory locks in PostgreSQL and how do they affect performance?
Advisory locks in PostgreSQL are user-defined locks that can lead to performance issues if not managed correctly. They are used by Citus to prevent distributed deadlocks during multi-shard operations, which can inadvertently cause latency spikes.
Key Statistics & Figures
Average query time reduction
From 48ms to 12ms
This improvement was achieved after optimizing the query structure to avoid advisory locks.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Database
Postgresql
Used as the primary relational database for managing data.
Database Extension
Citus
Enables horizontal scalability and sharding for PostgreSQL databases.
Observability Tool
Pganalyze
Used for monitoring PostgreSQL performance and diagnosing query issues.
Key Actionable Insights
1Always specify the distribution column in your queries when working with distributed tables in Citus.This practice allows the query planner to optimize routing, reducing latency and avoiding unnecessary locking issues.
2Utilize PostgreSQL observability tools to monitor and diagnose query performance issues.Tools like pganalyze can provide insights into locking behavior and help identify performance bottlenecks in distributed environments.
3Consider breaking complex queries into multiple simpler queries to avoid multi-shard updates.While this may seem counterintuitive, it can significantly reduce locking and improve overall query performance in distributed databases.
Common Pitfalls
1
Failing to specify the distribution column in queries can lead to performance degradation.
This mistake results in the Citus query planner treating the operation as a multi-shard update, which incurs additional locking and latency.
2
Assuming that subqueries will optimize performance in a distributed setup.
In this case, the subquery was executed across all shards, still resulting in multi-shard updates and associated locking issues.
Related Concepts
Distributed Databases
Query Optimization Techniques
Postgresql Locking Mechanisms