Interest Taxonomy: A knowledge graph management system for content understanding at Pinterest

Pinterest Engineering
9 min readintermediate
--
View Original

Overview

The article discusses Pinterest's Interest Taxonomy, a knowledge graph management system designed to enhance content understanding and trend analysis on the platform. It highlights the system's architecture, use cases, and the machine learning models that leverage this taxonomy for personalized recommendations and advertising.

What You'll Learn

1

How to implement a taxonomy-based knowledge management system for content classification

2

Why understanding user interests is crucial for targeted advertising

3

How to leverage machine learning for content mapping in a scalable way

4

How to utilize RDF for data modeling in knowledge graphs

Prerequisites & Requirements

  • Understanding of machine learning concepts and taxonomy systems
  • Familiarity with WebProtégé for visualization and curation(optional)

Key Questions Answered

What is the purpose of the Interest Taxonomy at Pinterest?
The Interest Taxonomy at Pinterest serves to classify and organize popular topics and entities, enabling better content understanding and targeted advertising. It helps in analyzing user behavior and emerging trends through a hierarchical structure of interests.
How does Pinterest map Pins to its Interest Taxonomy?
Pinterest uses a scalable machine learning system called Pin2Interest (P2I) to map over 200 billion Pins to its Interest Taxonomy. P2I utilizes text and visual inputs, applying Natural Language Processing techniques to predict and rank relevant taxonomy nodes for each Pin.
What role does user engagement play in the Interest Taxonomy?
User engagement is critical as it informs the user2interest ML system, which infers users' interests based on the Pins they interact with. This data is essential for optimizing ads targeting and generating organic recommendations.
How does Pinterest ensure the quality of its Interest Taxonomy?
Pinterest maintains the quality of its Interest Taxonomy through a collaborative curation process using RDF data modeling and the WebProtégé tool. This process includes manual reviews and incremental updates to ensure relevance and accuracy.

Key Statistics & Figures

Number of Pins saved on Pinterest
200 billion
This vast corpus of Pins is essential for understanding user behavior and emerging trends.
Hierarchy levels in Interest Taxonomy
up to 11 levels
This granularity allows for detailed categorization of interests, enhancing the accuracy of content mapping.
Percentage of Pins mapped to taxonomy nodes
99%
This high mapping rate indicates the effectiveness of the taxonomy in classifying diverse content.

Technologies & Tools

Data Modeling
Rdf
Used for modeling the taxonomy data into graphs for visualization and curation.
Tools
Webprotégé
An open-source tool utilized for the visualization and human curation of the Interest Taxonomy.
Machine Learning
Pin2interest
A scalable system for mapping Pins to the Interest Taxonomy.
Machine Learning
Natural Language Processing (nlp)
Techniques used in the Pin2Interest system for content classification.

Key Actionable Insights

1
Implementing a taxonomy-based system can significantly enhance content understanding and user engagement. By categorizing content into a structured hierarchy, businesses can better analyze trends and user preferences.
This approach is particularly useful for platforms like Pinterest, where understanding user intent and interests is crucial for delivering relevant content and advertisements.
2
Leveraging machine learning models for content classification can streamline the process of mapping large datasets to taxonomy nodes. This not only improves accuracy but also enables real-time updates to content categorization.
As demonstrated by Pinterest's Pin2Interest system, integrating ML with taxonomy management can lead to more personalized user experiences and effective ad targeting.
3
Utilizing RDF for data modeling in knowledge graphs allows for flexible and scalable taxonomy management. It facilitates collaborative curation and visualization, ensuring that the taxonomy remains relevant and up-to-date.
This method is beneficial for organizations looking to maintain a high-quality taxonomy that adapts to changing user interests and market trends.

Common Pitfalls

1
Failing to maintain an updated taxonomy can lead to misclassification of content, which can negatively impact user experience and engagement.
This often occurs when organizations do not have a systematic approach to taxonomy updates, leading to outdated or irrelevant content categorizations.
2
Over-reliance on automated systems without human oversight can result in inaccuracies in the taxonomy.
While machine learning can enhance efficiency, it is crucial to have manual reviews to ensure the quality and relevance of the taxonomy.

Related Concepts

Knowledge Graphs
Machine Learning In Content Classification
User Behavior Analysis
Taxonomy Management