GPU Integration Propels Data Center Efficiency and Cost Savings for Taboola

When you see a context-relevant advertisement on a web page, it’s most likely content served by a Taboola data pipeline. As the leading content recommendation…

Eyal Hirsch
12 min readadvanced
--
View Original

Overview

The article discusses how Taboola integrated GPUs into their data processing pipeline to enhance efficiency and reduce costs. It details the challenges faced during the migration from CPU to GPU, the performance improvements achieved, and the lessons learned throughout the process.

What You'll Learn

1

How to migrate from CPU to GPU for data processing

2

Why tuning parameters is crucial in complex environments

3

How to identify and resolve bottlenecks in data processing pipelines

Prerequisites & Requirements

  • Understanding of Apache Spark and GPU acceleration concepts
  • Familiarity with RAPIDS Accelerator and Kubernetes(optional)

Key Questions Answered

What were the main challenges faced during Taboola's GPU migration?
Taboola faced several challenges during their GPU migration, including the need to optimize Parquet parsing, address network bottlenecks with inadequate NIC cards, and resolve disk I/O issues caused by RAID configurations. These bottlenecks significantly impacted query performance and required strategic hardware and software adjustments.
How did Taboola achieve performance improvements with GPUs?
Taboola achieved significant performance improvements by migrating thousands of CPU cores to GPUs, specifically using the NVIDIA A30 GPU. They recorded performance factors of 20x and higher for their heaviest queries compared to the CPU cluster, demonstrating the effectiveness of GPU acceleration in handling large data workloads.
What hardware specifications were used in the migration process?
The initial hardware setup included a 72 CPU core Intel server with three A30 GPUs, a 900-GB local SSD drive for Apache Spark, 380 GB of RAM, and a 10-Gb/s NIC card. This setup was critical for testing the migration to GPU acceleration.
What tuning parameters improved GPU query performance?
Key tuning parameters that improved GPU query performance included increasing the maximum partition bytes from the default 128 MB to 1-2 GB, adjusting the number of concurrent GPU tasks, and optimizing shuffle partitions. These adjustments helped enhance the efficiency of query execution on GPUs.

Key Statistics & Figures

Average CPU production time for AdvertiserDimensionByRequest
586.41 seconds
This metric illustrates the time taken by a specific query on the CPU before migrating to GPU.
Average GPU time for AdvertiserDimensionByRequest
31.91 seconds
This shows the reduced processing time for the same query after migrating to GPU, highlighting the performance improvement.
GPU factor for ExperimentAnalysisPage
29.36
This indicates the performance improvement factor achieved for this specific query after migration to GPU.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Apache Spark
Used for processing large datasets in Taboola's data pipeline.
Backend
Rapids Accelerator
Facilitates GPU acceleration for Apache Spark workloads.
Orchestration
Kubernetes
Used for managing the multi-GPU and multi-server cluster.
Hardware
Nvidia A30
The GPU model used for accelerating data processing tasks.

Key Actionable Insights

1
Optimize your data processing pipeline by migrating to GPUs to achieve significant performance gains.
Taboola's experience showed that migrating to GPUs allowed them to handle larger data loads more efficiently, achieving performance factors of 20x or higher. This can be particularly beneficial for organizations facing increasing data processing demands.
2
Regularly tune parameters in your data processing environment to ensure optimal performance.
The article emphasizes the importance of tuning parameters such as partition sizes and concurrent tasks to maximize GPU efficiency. This practice can help prevent bottlenecks and improve overall system performance.
3
Identify and address all potential bottlenecks in your data pipeline, including network and disk I/O.
Taboola's challenges with network and disk I/O bottlenecks highlight the need for a holistic approach to performance optimization. By addressing these issues, organizations can ensure that their GPUs are utilized effectively.

Common Pitfalls

1
Overlooking network and disk I/O bottlenecks can severely impact performance.
Many organizations focus solely on GPU performance without addressing underlying network and disk issues, leading to inefficient resource utilization and slower query execution.
2
Failing to tune parameters appropriately for GPU workloads.
Default settings may not be suitable for GPU processing, and not adjusting these can lead to suboptimal performance. Regular tuning is essential to maximize the benefits of GPU acceleration.

Related Concepts

GPU Acceleration Techniques
Apache Spark Performance Optimization
Data Pipeline Architecture