Redesigning Pinterest’s Ad Serving Systems with Zero Downtime

Pinterest Engineering
9 min readintermediate
--
View Original

Overview

This article discusses the complete redesign of Pinterest's ad serving system, known as Mohawk, which was rewritten to eliminate technical debt and improve performance, enabling the company to meet ambitious business goals. The new system, built on a Java-based framework, has successfully operated without significant outages since its launch.

What You'll Learn

1

How to effectively redesign a complex ad serving system for zero downtime

2

Why modularization and separation of concerns are critical in software architecture

3

How to implement a directed acyclic graph (DAG) structure for code organization

Prerequisites & Requirements

  • Understanding of software architecture principles, particularly modularization and separation of concerns
  • Familiarity with Java-based frameworks(optional)

Key Questions Answered

What were the main motivations for rewriting Pinterest's ad serving system?
The motivations included addressing significant tech debt accumulated over eight years, which made the system complex and brittle, leading to outages. The rewrite aimed to enhance performance, scalability, and maintainability, ensuring the system could support Pinterest's ambitious business goals.
How did the decision-making process for the rewrite unfold?
The decision-making process involved three months of research, prototyping, and scrutiny of options. Ultimately, it was decided that a complete rewrite would be more efficient than a major refactor, particularly because it would allow for higher throughput and better integration with existing Java-based frameworks.
What design principles guided the AdMixer Rewrite project?
The design principles included extensibility, separation of concerns, safe-by-design practices, and ensuring development velocity. These principles were essential for creating a robust and flexible ad serving platform that could support rapid product innovation.
What were the key design decisions made during the project?
Key design decisions included using an in-house graph execution framework called Apex to organize the code into a directed acyclic graph (DAG) and developing a new data model to ensure safe execution. These decisions aimed to improve modularity and data integrity.

Key Statistics & Figures

Ad impressions served daily
More than 2 billion
This metric highlights the scale at which the ad serving system operates.
Ad revenue generated annually
$2.8 billion
This figure underscores the financial importance of the ad serving system to Pinterest's overall business.
Lines of code in the original system
More than 380K
This statistic illustrates the complexity and size of the original Mohawk system.
Number of engineers involved in the original system
More than 100
This indicates the collaborative effort required to maintain and develop the original ad serving system.
Developer satisfaction NPS score after the rewrite
90
This significant increase from a previous score of 46 reflects improved developer experience and satisfaction with the new system.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Java
Used as the primary programming language for the new ad serving system.
Backend
Apex
An in-house graph execution framework used to organize code into directed acyclic graphs (DAGs).
Infrastructure
AWS Graviton
Utilized for running the new service, resulting in infrastructure cost reductions.

Key Actionable Insights

1
Prioritize modularization in software design to enhance maintainability and reduce complexity.
By organizing code into distinct modules, teams can work independently without the risk of breaking each other's changes, which is crucial for large-scale systems like Pinterest's ad serving platform.
2
Implement a directed acyclic graph (DAG) structure for managing dependencies in complex systems.
Using a DAG allows for clearer data flow and dependency management, which can significantly improve system performance and reliability.
3
Adopt a safe-by-design approach to concurrency in software development.
Ensuring that concurrency is handled safely can prevent race conditions and data integrity issues, which are common pitfalls in complex systems.

Common Pitfalls

1
Failing to modularize code can lead to complex interdependencies that make maintenance difficult.
When code is not modularized, changes in one area can inadvertently affect others, leading to bugs and increased time spent on debugging.
2
Neglecting concurrency safety can result in hard-to-detect bugs.
Without proper frameworks for concurrency, developers may introduce race conditions that compromise data integrity, making issues difficult to diagnose.

Related Concepts

Software Architecture Principles
Modularization And Separation Of Concerns
Concurrency In Software Development