Two years of vector search at Notion: 10x scale, 1/10th cost

Preeti Gondi, Mickey Liu, Nathan Louie, Calder Lund, Jacob Sager
10 min readintermediate
--
View Original

Overview

The article discusses Notion's journey in scaling its vector search infrastructure, achieving a 10x increase in scale while reducing costs by 90% over two years. It details the implementation of a dual-path indexing system, migration to a serverless architecture, and the adoption of new technologies to optimize performance and cost.

What You'll Learn

1

How to implement a dual-path indexing system for real-time and batch processing

2

Why migrating to a serverless architecture can reduce operational costs

3

When to consider alternative search engines like turbopuffer for cost efficiency

4

How to optimize embedding pipelines using Ray for better performance

Prerequisites & Requirements

  • Understanding of vector search and embeddings
  • Familiarity with Apache Spark and Kafka(optional)

Key Questions Answered

How did Notion scale its vector search infrastructure?
Notion scaled its vector search infrastructure by implementing a dual-path indexing system that included both offline batch processing and online real-time updates. This allowed for efficient onboarding of millions of workspaces while maintaining low latency in updates.
What were the cost savings achieved by migrating to a serverless architecture?
By migrating to a serverless architecture, Notion achieved a 50% cost reduction from peak usage, saving several millions of dollars annually. This migration also removed storage capacity constraints that had previously limited scaling.
What improvements were made during the turbopuffer migration?
During the turbopuffer migration, Notion achieved a 60% cost reduction on search engine spend and a 35% reduction in AWS EMR compute costs. Additionally, production query latency improved from 70-100ms to 50-70ms.
How does Notion handle updates to long pages efficiently?
Notion implemented a system that tracks changes in page text and metadata using hashes. This allows for selective re-embedding and updating of only the changed spans, resulting in a 70% reduction in data volume for embeddings.

Key Statistics & Figures

Scale increase
10x
Achieved in vector search infrastructure over two years.
Cost reduction
90%
Realized through various optimizations in the infrastructure.
Daily onboarding capacity increase
600x
Achieved by optimizing the onboarding process.
Active workspaces growth
15x
Growth in active workspaces due to improved onboarding.
Vector database capacity expansion
8x
Expansion achieved to accommodate increased demand.
Cost reduction from turbopuffer migration
60%
Reduction on search engine spend post-migration.
Reduction in AWS EMR compute costs
35%
Achieved during the turbopuffer migration.
Improvement in production query latency
from 70-100ms to 50-70ms
Improvement achieved after migrating to turbopuffer.
Data volume reduction
70%
Reduction in data volume achieved through efficient update handling.
Cost reduction in embeddings infrastructure
90+%
Expected reduction from migrating to Ray.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement a dual-path indexing system to enhance data processing efficiency.
This approach allows for both real-time updates and batch processing, ensuring that large datasets can be handled without sacrificing performance.
2
Consider migrating to a serverless architecture to reduce costs and operational complexity.
Serverless architectures can decouple storage from compute, allowing for more flexible scaling and significant cost savings.
3
Evaluate alternative search engines like turbopuffer for potential cost reductions.
Newer technologies may offer better pricing models and performance optimizations that can lead to substantial savings.
4
Optimize embedding pipelines by leveraging frameworks like Ray.
Ray allows for efficient processing of embeddings and can significantly reduce infrastructure costs while improving performance.

Common Pitfalls

1
Failing to track changes effectively can lead to unnecessary reprocessing.
Without a proper mechanism to detect changes, systems may waste resources by re-embedding entire documents instead of only the modified sections.

Related Concepts

Vector Search Optimization
Serverless Architecture Benefits
Embeddings And Their Applications
Real-time Data Processing Techniques