Overview
The article discusses Notion's journey in scaling its vector search infrastructure, achieving a 10x increase in scale while reducing costs by 90% over two years. It details the implementation of a dual-path indexing system, migration to a serverless architecture, and the adoption of new technologies to optimize performance and cost.
What You'll Learn
How to implement a dual-path indexing system for real-time and batch processing
Why migrating to a serverless architecture can reduce operational costs
When to consider alternative search engines like turbopuffer for cost efficiency
How to optimize embedding pipelines using Ray for better performance
Prerequisites & Requirements
- Understanding of vector search and embeddings
- Familiarity with Apache Spark and Kafka(optional)
Key Questions Answered
How did Notion scale its vector search infrastructure?
What were the cost savings achieved by migrating to a serverless architecture?
What improvements were made during the turbopuffer migration?
How does Notion handle updates to long pages efficiently?
Key Statistics & Figures
Technologies & Tools
Some links below are affiliate links. We may earn a commission if you make a purchase.
Key Actionable Insights
1Implement a dual-path indexing system to enhance data processing efficiency.This approach allows for both real-time updates and batch processing, ensuring that large datasets can be handled without sacrificing performance.
2Consider migrating to a serverless architecture to reduce costs and operational complexity.Serverless architectures can decouple storage from compute, allowing for more flexible scaling and significant cost savings.
3Evaluate alternative search engines like turbopuffer for potential cost reductions.Newer technologies may offer better pricing models and performance optimizations that can lead to substantial savings.
4Optimize embedding pipelines by leveraging frameworks like Ray.Ray allows for efficient processing of embeddings and can significantly reduce infrastructure costs while improving performance.