RAPIDS Accelerator for Apache Spark Release v21.08

RAPIDS Accelerator for Apache Spark release v21.08 with new features improving end-to-end speed with all NDS queries running on GPU.

Eric Rife
5 min readbeginner
--
View Original

Overview

The RAPIDS Accelerator for Apache Spark v21.08 release enhances performance and functionality for Apache Spark applications, allowing all 105 SQL queries from the NVIDIA Decision Support benchmark to run on GPUs without code changes. This release focuses on ease-of-use, improved speed, and reduced total cost of ownership for NVIDIA EGX servers.

What You'll Learn

1

How to run all 105 SQL queries from the NDS benchmark on GPUs without code changes

2

Why using RAPIDS Accelerator can lower total cost of ownership for NVIDIA EGX servers

3

How to utilize the Profiling & Qualification tool for Apache Spark event logs

4

When to apply the new window functions like rank and dense_rank in SQL

Key Questions Answered

What improvements does the RAPIDS Accelerator v21.08 bring to Apache Spark applications?
The RAPIDS Accelerator v21.08 enhances ease-of-use for Apache Spark applications by allowing all 105 SQL queries from the NDS benchmark to run on GPUs without any code changes. It also introduces new functionalities such as out-of-core group by and window functions, improving performance and reducing costs.
How does the benchmark setup for the RAPIDS Accelerator v21.08 look?
The benchmark setup for the RAPIDS Accelerator v21.08 includes a scale factor of 3K (3TB dataset), utilizing four NVIDIA Certified EGX Servers with specific hardware configurations including Dell R740xd nodes, A30 GPUs, and software versions like RAPIDS Accelerator v21.08.0 and Apache Spark 3.1.1.
What is the cost comparison between GPU and CPU servers for the benchmark?
The benchmark GPU servers cost approximately 1.29 times more than CPU servers, with total costs around $170,000 for CPU-only servers and $220,000 for GPU-inclusive servers. Despite the higher initial cost, GPU servers can run over 95 queries faster, making them cheaper to operate in the long run.
What new functionalities were added in the RAPIDS Accelerator v21.08?
New functionalities in the RAPIDS Accelerator v21.08 include support for multi-level struct data types, writing array data types in Parquet format, and the addition of rank and dense_rank window functions. These enhancements improve the overall capabilities of SQL operations within the accelerator.

Key Statistics & Figures

Total cost of ownership for GPU servers
1.29 times
Compared to CPU servers, indicating that while GPU servers have a higher upfront cost, they can provide better performance.
Number of SQL queries passing in benchmark
105
All queries can now run on the GPU without code changes, showcasing the enhanced capabilities of the RAPIDS Accelerator.
Speed-up range for GPU queries
1x to 18x
Indicating the variability in performance improvements depending on the specific query being executed.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Software
Rapids Accelerator
Used to accelerate Apache Spark applications on NVIDIA GPUs.
Software
Apache Spark
The primary framework for executing data processing tasks that the RAPIDS Accelerator enhances.
Hardware
Nvidia A30 GPU
Used in the benchmark setup to provide GPU acceleration for data processing.

Key Actionable Insights

1
Leverage the new functionalities of the RAPIDS Accelerator to enhance your SQL operations on GPUs.
Utilizing features like out-of-core group by and window functions can significantly improve performance for data-intensive applications, making it easier to handle larger datasets without code changes.
2
Consider using the Profiling & Qualification tool to analyze your existing Apache Spark workloads.
This tool can help identify which queries will benefit most from GPU acceleration, allowing you to optimize resource allocation and improve performance.
3
Evaluate the cost-effectiveness of transitioning from CPU to GPU servers for your data processing needs.
With GPU servers costing 1.29 times more but offering substantial speed improvements, it’s essential to assess the potential return on investment for your specific use cases.

Common Pitfalls

1
Failing to qualify the right use cases for GPU acceleration can lead to suboptimal performance.
Users may assume all workloads will benefit from GPU acceleration, but it's crucial to analyze specific queries and their performance characteristics to ensure effective use of resources.

Related Concepts

Apache Spark Performance Optimization
GPU Acceleration In Data Processing
Benchmarking SQL Queries With Nds