Overview
The article discusses the process of sharding PostgreSQL at Notion, detailing the challenges faced and the lessons learned during the migration from a monolithic database to a horizontally-partitioned database fleet. It highlights the importance of this transition for improving performance and reliability as user demand increased.
What You'll Learn
1
How to implement application-level sharding in a PostgreSQL database
2
Why sharding is critical for scaling databases in high-demand applications
3
When to recognize the need for sharding based on database performance metrics
Prerequisites & Requirements
- Understanding of database scaling concepts and PostgreSQL
- Experience with database migrations and performance tuning(optional)
Key Questions Answered
What were the main reasons for Notion to shard their PostgreSQL database?
Notion decided to shard their PostgreSQL database due to increasing performance demands as user adoption grew, leading to CPU spikes and issues with the VACUUM process stalling. This transition was necessary to prevent transaction ID wraparound and improve overall application reliability.
How did Notion implement their sharding scheme?
Notion implemented application-level sharding, where they designed their own partitioning scheme to route queries based on workspace IDs. This allowed them to maintain control over data distribution and ensure that related data remained consistent across shards.
What challenges did Notion face during the migration to sharding?
During the migration, Notion faced challenges such as ensuring data consistency, managing double-writes to both old and new databases, and the need for a robust verification process to maintain data integrity throughout the transition.
What lessons did Notion learn from their sharding experience?
Notion learned the importance of sharding earlier to avoid strain on the existing database, aiming for zero-downtime migrations, and considering a combined primary key for better performance. These insights will inform future architectural decisions.
Key Statistics & Figures
Number of logical shards
480
Notion distributed their data across 480 logical shards to handle scaling effectively.
Number of physical databases
32
The architecture consisted of 32 physical databases to support the sharded setup.
Time taken for backfilling data
around three days
The backfill process for migrating existing data to the new databases took approximately three days.
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1Implement application-level sharding to maintain control over data distribution.This approach allows for tailored data management strategies that can optimize performance and consistency, especially in applications with complex data relationships.
2Recognize the signs of database strain early to avoid rushed migrations.Monitoring performance metrics such as CPU usage and transaction ID wraparound can help identify when to initiate sharding, preventing potential data loss and downtime.
3Use an audit log strategy during migrations to ensure data consistency.This method allows for tracking changes and applying them to new databases, which is crucial for maintaining integrity during transitions.
Common Pitfalls
1
Waiting too long to shard can lead to increased strain on the existing database.
Notion experienced significant performance issues because they delayed sharding until the database was heavily loaded, complicating the migration process.
2
Not optimizing for zero-downtime migrations can result in extended downtime.
Double-write throughput became a bottleneck during the migration, highlighting the need for more efficient migration strategies to minimize service interruptions.
Related Concepts
Database Sharding
Postgresql Performance Tuning
Data Consistency In Distributed Systems