Pinterest logo

How Pinterest Uses Apache

62 engineering articles about Apache from Pinterest's engineering team

Articles

Filter:
Pinterest logo
Pinterest
Intermediate
The article discusses Pinterest's transition to a next-generation database ingestion framework designed to address the limitations of legacy systems.
Pinterest Engineering
10 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Advanced
PinLanding is a multimodal AI pipeline developed by Pinterest to generate shopping collections from billions of products.
Pinterest Engineering
8 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
This article discusses Pinterest's transition from a Hadoop-based platform to a Kubernetes-based data processing solution named Moka.
Pinterest Engineering
19 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses how Pinterest enhances its machine learning feature iterations through an effective backfill process.
Pinterest Engineering
14 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the implementation of the Unified Dynamic Framework (UDF) at Pinterest, which has significantly improved the scalability and efficiency of experiment metric computing.
Pinterest Engineering
7 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
Pinterest successfully migrated 3. 7 million lines of code from Flow to TypeScript over eight months, enhancing type safety and developer experience.
Pinterest Engineering
12 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Change Data Capture (CDC) at Pinterest, detailing its importance for real-time data processing and the implementation of a Generic CDC solution using Debezium.
Pinterest Engineering
8 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses the transition from Apache Hadoop YARN to Apache YuniKorn for resource management in Pinterest's batch processing platform, Monarch, now rebranded as Moka.
Pinterest Engineering
10 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
This article discusses the implementation of Ray Batch Inference at Pinterest, highlighting its advantages over previous solutions like Apache Spark and Torch Dataloader.
Pinterest Engineering
11 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses the Structured DataStore (SDS), a unified multi-model data management platform developed by Pinterest.
Pinterest Engineering
18 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses Pinterest's implementation of feature caching in their recommender systems using Cachelib, an in-process caching engine developed by Meta Open Source.
Pinterest Engineering
11 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Pinterest's implementation of Tiered Storage for Apache Kafkaยฎ๏ธ, highlighting a broker-decoupled approach that offloads data to cheaper remote storage.
Pinterest Engineering
24 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Pinterest's adoption of TiDB as a replacement for HBase, detailing the motivations, selection methodology, and the journey of integrating TiDB into their infrastructure.
Pinterest Engineering
19 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses Pinterest's transition from HBase, its first NoSQL datastore, to a new serving architecture with a unified storage service.
Pinterest Engineering
7 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
LinkSage is a Graph Neural Network-based model developed by Pinterest to enhance off-site content understanding, improving user engagement and monetization.
Pinterest Engineering
10 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the challenges of online-offline discrepancies in Pinterest's ads ranking system, emphasizing the importance of aligning offline model performance with online business metrics.
Pinterest Engineering
16 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses the redesign of the Goku-Ingestor at Pinterest, focusing on enhancing performance, reducing costs, and minimizing code complexity.
Pinterest Engineering
10 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the implementation and operational benefits of the Unified PubSub Client (PSC) at Pinterest, highlighting improvements in developer velocity, stability, and scalability.
Pinterest Engineering
11 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Intermediate
This article discusses the debugging process of a direct memory leak encountered in Apache Flink applications at Pinterest.
Pinterest Engineering
9 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses how Pinterest improved its machine learning (ML) dataset iteration speed using Ray, an open-source framework for scaling AI and ML workloads.
Pinterest Engineering
9 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
This article discusses Pinterest's implementation of a finer-grained access control (FGAC) framework to manage data access securely and efficiently within their data engineering platform.
Pinterest Engineering
18 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses the optimization of Flink clusters at Pinterest to enhance stability and efficiency, detailing the strategies implemented to reduce costs and improve performance.
Pinterest Engineering
14 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses the Warden Anomaly Detection Platform developed at Pinterest, focusing on its architecture, use cases, and the importance of real-time anomaly detection in various application...
Pinterest Engineering
15 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses the development of a large-scale user signal platform at Pinterest, which enables real-time indexing of user events and construction of user sequences for machine learning app...
Pinterest Engineering
14 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
This article details Pinterest's experience upgrading their Batch Processing Platform, Monarch, from Hadoop 2. 7. 1 to Hadoop 2. 10. 0.
Pinterest Engineering
14 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the development of a unified PubSub client library at Pinterest, aimed at improving the scalability, stability, and developer velocity of the Logging Platform.
Pinterest Engineering
12 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Beginner
The article discusses Pinterest's automated campaign budget optimization product, which helps advertisers efficiently allocate their advertising budgets across various ad groups.
Pinterest Engineering
10 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses Pinterest's migration from its legacy workflow system, Pinball, to a new platform called Spinner, built on Apache Airflow.
Pinterest Engineering
18 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Spinner, Pinterest's workflow platform, detailing its evolution from an in-house scheduler called Pinball to Apache Airflow.
Pinterest Engineering
23 min read
Includes Code
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article summarizes the most popular engineering blog posts from Pinterest in 2021, highlighting significant advancements and initiatives by Pinterest engineers.
Pinterest Engineering
2 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses the Campaign Budget Optimization (CBO) product developed by Pinterest's Ads Intelligence team, which automates the distribution of advertising budgets across ad groups to maxi...
Pinterest Engineering
8 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
MemQ is a new, efficient, and scalable cloud-native PubSub system developed by Pinterest, designed to handle Near Real-Time data transportation while being up to 90% more cost-effective than Apache...
Pinterest Engineering
12 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the development and implementation of DrSquirrel, a self-service diagnosis tool at Pinterest aimed at enhancing the troubleshooting process for Apache Flink jobs.
Pinterest Engineering
10 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
Pinterest utilizes Flink as its stream processing engine to build a reliable and scalable platform called Xenon.
Pinterest Engineering
7 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses how Pinterest utilizes Apache Spark SQL for interactive querying, detailing the architecture, challenges faced, and solutions implemented to enhance user experience.
Pinterest logo
Pinterest
Intermediate
The article discusses how Pinterest improved data processing efficiency by implementing partial deserialization of Thrift encoded data.
Pinterest Engineering
7 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
This article discusses the development of a label-based enforcement pipeline at Pinterest aimed at enhancing Trust & Safety.
Pinterest Engineering
9 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses enhancements made to Kafka MirrorMaker, specifically the development of Shallow Mirror, which aims to reduce CPU and memory pressure during data replication across Kafka clust...
Pinterest Engineering
8 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses the implementation of a near-real-time image similarity detection system at Pinterest using Apache Flink.
Pinterest Engineering
8 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses Pinterest's Flink Deployment Framework, which is built on Bazel and integrates with various internal services to streamline the deployment of Flink jobs.
Pinterest Engineering
6 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Pinterest's transition from Lambda architecture to Kappa architecture for visual signals infrastructure, focusing on the need for real-time processing of machine learning sign...
Pinterest Engineering
8 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses how Pinterest empowered its data scientists and machine learning engineers by building a PySpark infrastructure that addresses challenges faced with existing tools like Hive a...
Pinterest logo
Pinterest
Intermediate
This article discusses how Pinterest's Logging Platform team utilizes graph algorithms to optimize Kafka operations, particularly focusing on addressing the imbalanced leader problem in Kafka clust...
Pinterest Engineering
6 min read
Has Summary
--
Pinterest logo
Pinterest
Advanced
The article discusses Pinterest's transition from using Apache HBase to Apache Druid for ads analytics, highlighting the challenges faced and the benefits of Druid's capabilities in handling comple...
Pinterest Engineering
10 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the ads indexing system at Pinterest, detailing its architecture, design, and implementation.
Pinterest Engineering
16 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the importance of upgrading outbound Pin links from HTTP to HTTPS to enhance user security on Pinterest.
Pinterest Engineering
5 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses Pinterest's implementation of a near real-time experimentation platform using Apache Flink to analyze thousands of experiments daily.
Pinterest Engineering
11 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the concept of 'Pinterest Paths', which describes the exploration behavior of users on Pinterest as they navigate through related ideas.
Pinterest Engineering
6 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
Pinterest operates one of the largest Kafka deployments in the cloud, utilizing Apache Kafka as a message bus for data transport and real-time streaming services.
Pinterest Engineering
6 min read
Has Summary
--
Pinterest logo
Pinterest
Intermediate
The article discusses the challenges Pinterest faced with their Apache Thrift schemas, which had become tightly coupled and complex, leading to inefficiencies in development cycles.
Pinterest Engineering
12 min read
Has Summary
--