Two years of vector search at Notion: 10x scale, 1/10th cost

Preeti Gondi, Mickey Liu, Nathan Louie, Calder Lund, Jacob Sager

Notion

•

Preeti Gondi, Mickey Liu, Nathan Louie, Calder Lund, Jacob Sager

•10 min read•intermediate•

--

•View Original

ApacheApache SparkAWSDynamoDBHugging FacePython

Overview

The article discusses Notion's journey in scaling its vector search infrastructure, achieving a 10x increase in scale while reducing costs by 90% over two years. It details the implementation of a dual-path indexing system, migration to a serverless architecture, and the adoption of new technologies to optimize performance and cost.

What You'll Learn

1

How to implement a dual-path indexing system for real-time and batch processing

2

Why migrating to a serverless architecture can reduce operational costs

3

When to consider alternative search engines like turbopuffer for cost efficiency

4

How to optimize embedding pipelines using Ray for better performance

Prerequisites & Requirements

Understanding of vector search and embeddings
Familiarity with Apache Spark and Kafka(optional)

Key Questions Answered

How did Notion scale its vector search infrastructure?

Notion scaled its vector search infrastructure by implementing a dual-path indexing system that included both offline batch processing and online real-time updates. This allowed for efficient onboarding of millions of workspaces while maintaining low latency in updates.

What were the cost savings achieved by migrating to a serverless architecture?

By migrating to a serverless architecture, Notion achieved a 50% cost reduction from peak usage, saving several millions of dollars annually. This migration also removed storage capacity constraints that had previously limited scaling.

What improvements were made during the turbopuffer migration?

During the turbopuffer migration, Notion achieved a 60% cost reduction on search engine spend and a 35% reduction in AWS EMR compute costs. Additionally, production query latency improved from 70-100ms to 50-70ms.

How does Notion handle updates to long pages efficiently?

Notion implemented a system that tracks changes in page text and metadata using hashes. This allows for selective re-embedding and updating of only the changed spans, resulting in a 70% reduction in data volume for embeddings.

Key Statistics & Figures

Scale increase

10x

Achieved in vector search infrastructure over two years.

Cost reduction

90%

Realized through various optimizations in the infrastructure.

Daily onboarding capacity increase

600x

Achieved by optimizing the onboarding process.

Active workspaces growth

15x

Growth in active workspaces due to improved onboarding.

Vector database capacity expansion

8x

Expansion achieved to accommodate increased demand.

Cost reduction from turbopuffer migration

60%

Reduction on search engine spend post-migration.

Reduction in AWS EMR compute costs

35%

Achieved during the turbopuffer migration.

Improvement in production query latency

from 70-100ms to 50-70ms

Improvement achieved after migrating to turbopuffer.

Data volume reduction

70%

Reduction in data volume achieved through efficient update handling.

Cost reduction in embeddings infrastructure

90+%

Expected reduction from migrating to Ray.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Apache Spark

Used for batch processing in the ingestion indexing pipeline.

Backend

Kafka

Used for real-time updates in the ingestion indexing pipeline.

Search Engine

Turbopuffer

Evaluated and migrated to for cost-effective vector search.

Backend

Ray

Used for optimizing the embeddings pipeline and serving.

Database

Dynamodb

Used for caching page state and metadata.

Key Actionable Insights

1
Implement a dual-path indexing system to enhance data processing efficiency.
This approach allows for both real-time updates and batch processing, ensuring that large datasets can be handled without sacrificing performance.

2
Consider migrating to a serverless architecture to reduce costs and operational complexity.
Serverless architectures can decouple storage from compute, allowing for more flexible scaling and significant cost savings.

3
Evaluate alternative search engines like turbopuffer for potential cost reductions.
Newer technologies may offer better pricing models and performance optimizations that can lead to substantial savings.

4
Optimize embedding pipelines by leveraging frameworks like Ray.
Ray allows for efficient processing of embeddings and can significantly reduce infrastructure costs while improving performance.

Common Pitfalls

1

Failing to track changes effectively can lead to unnecessary reprocessing.

Without a proper mechanism to detect changes, systems may waste resources by re-embedding entire documents instead of only the modified sections.

Related Concepts

Vector Search Optimization

Serverless Architecture Benefits

Embeddings And Their Applications

Real-time Data Processing Techniques