Rebuilding the user typeahead

Pinterest Engineering
9 min readintermediate
--
View Original

Overview

The article discusses the rebuilding of Pinterest's user typeahead feature to enhance user experience in discovering contacts. It details the architectural changes, including the development of a new backend service and the integration of various data sources to improve speed, relevance, and maintainability.

What You'll Learn

1

How to build a separate backend service for user typeahead functionality

2

Why using HBase is beneficial for contact indexing and retrieval

3

How to implement real-time updates for user contact information

4

When to apply different schemas for storing user contacts based on source

Prerequisites & Requirements

  • Understanding of service-oriented architecture and backend development
  • Familiarity with HBase and Thrift(optional)

Key Questions Answered

How does the new user typeahead improve performance compared to the legacy system?
The new user typeahead implementation achieves a server-side p99 latency of 25ms, significantly faster than the legacy system. This improvement has led to increased message sends and Pinner interactions, demonstrating enhanced user engagement.
What strategies were used to rank and de-duplicate contacts in the typeahead?
The Contacts Service employs configurable ranking based on mutual followers and de-duplication logic that cross-references social network IDs to ensure unique contact listings. This approach enhances the relevance of displayed contacts in the typeahead results.
What are the two schemas used for storing user contacts?
Pinterest uses a wide schema for most sources, storing contacts in a single row, and a tall schema for sources with potentially large contact numbers, where contacts are stored in nearby rows. This design optimizes data retrieval based on the source type.
How does Pinterest handle real-time updates for user contacts?
Real-time updates are triggered by changes in user contact information, such as name changes or connecting social accounts. The system uses PinLater tasks to ensure that updates are processed without overloading the backend, allowing immediate reflection of changes.

Key Statistics & Figures

Server-side p99 latency
25ms
This metric illustrates the performance improvement of the new typeahead system compared to the legacy version.
Time taken to upload initial user connections
three days
This was the duration required to upload all connections for each Pinterest user using a dedicated PinLater cluster.

Technologies & Tools

Database
Hbase
Used for storing and indexing user contacts to enable fast lookups.
Backend
Thrift
Provides interfaces to manage the contacts index and allows clients to update and query contact information.
Backend
Pinlater
Manages asynchronous tasks to keep the contacts index in sync with various sources.

Key Actionable Insights

1
Implement a separate backend service for features that require high responsiveness and scalability.
By decoupling the typeahead functionality from the main API, Pinterest improved maintainability and deployment flexibility, allowing for independent updates and optimizations.
2
Utilize HBase for applications needing fast data retrieval and scalability.
HBase's ability to perform quick scans and its horizontal scalability make it an ideal choice for applications like user contact indexing, where performance and growth are critical.
3
Design your data storage schema based on the expected volume of data from different sources.
Using wide and tall schemas allows for efficient data management and retrieval, accommodating both small and large datasets effectively.

Common Pitfalls

1
Failing to clearly communicate the value of permissions when accessing user contacts can lead to low acceptance rates.
Users may be hesitant to grant permissions if they do not understand the benefits. Providing clear explanations upfront can improve user engagement and acceptance.

Related Concepts

Service-oriented Architecture
Real-time Data Processing
Data Indexing Strategies