Building Pinalytics: Pinterest’s data analytics engine

Pinterest Engineering
7 min readintermediate
--
View Original

Overview

The article discusses the development of Pinalytics, Pinterest's customizable big data analytics platform designed to enhance data accessibility and analysis for employees. It outlines the architecture, components, and features of Pinalytics, emphasizing its scalability and user-friendly interface.

What You'll Learn

1

How to create custom dashboards for data analysis using Pinalytics

2

Why scalable backend architecture is crucial for big data analytics

3

How to implement custom metric computations with the metrics composer

Prerequisites & Requirements

  • Understanding of data analytics concepts
  • Familiarity with SQL and data visualization tools(optional)

Key Questions Answered

What are the main components of the Pinalytics architecture?
Pinalytics consists of three main components: a web app for user interaction, a reporter for generating reports, and a backend powered by a Thrift service and HBase databases. This architecture supports efficient data analysis and reporting.
How does Pinalytics support custom metric computation?
Pinalytics features a metrics composer that allows users to create customized time series data by combining metrics using composite functions. This enables teams to dynamically evaluate and display complex calculations.
What challenges did Pinterest face while building Pinalytics?
Pinterest faced challenges in efficiently aggregating multidimensional data at runtime without pre-processing, which required a scalable backend to handle large datasets and provide low latency for queries.
What is the significance of the metrics composer in Pinalytics?
The metrics composer enables users to create custom metrics by combining existing metrics through mathematical functions, facilitating advanced data analysis and allowing for tailored insights based on specific business needs.

Key Statistics & Figures

Number of custom dashboards available in Pinalytics
More than 100
These dashboards provide access to various charts and metrics for daily tracking.
Speedup achieved through data transformation and rolling-window algorithm
20x – 50x
This improvement in processing efficiency allows for faster computation of user activity metrics.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Database
Mysql
Used in the web application stack for data management.
Tools
Sqlalchemy
Used for database interaction within the web application.
Backend Framework
Flask
Framework for building the web application.
Frontend Framework
React
Used to build user interface components for interactive visualizations.
Database
Hbase
Provides the backend storage for Pinalytics, enabling scalable data processing.
Backend Service
Thrift
Facilitates communication between the web app and backend services.

Key Actionable Insights

1
Utilize the metrics composer to create tailored metrics that reflect your team's specific KPIs. This allows for more relevant data insights and can improve decision-making processes.
By leveraging the metrics composer, teams can dynamically adjust their metrics to align with changing business objectives, enhancing the overall effectiveness of data-driven strategies.
2
Implement custom dashboards to visualize key metrics and trends effectively. This can streamline data analysis and improve communication across teams.
Custom dashboards facilitate quick access to vital information, enabling teams to respond swiftly to changes in user behavior or market conditions.
3
Focus on optimizing the backend architecture for scalability to handle increasing data loads efficiently. This is crucial for maintaining performance as data volume grows.
A scalable backend ensures that analytics tools remain responsive and effective, even as the complexity and size of the data increase.

Common Pitfalls

1
Redundant processing of the same data can occur if custom jobs are not managed properly.
To avoid this, it's essential to consolidate core ETLs that compute common metrics to ensure efficiency and reduce unnecessary computations.
2
Failing to optimize the backend for scalability can lead to performance bottlenecks as data volume increases.
It's crucial to design a backend that can handle flexible and efficient aggregation over multidimensional data to maintain low latency.

Related Concepts

Big Data Analytics
Data Visualization Techniques
Scalable Architecture Design
Custom Reporting Solutions