NVIDIA logo

How NVIDIA Uses Apache Spark

77 engineering articles about Apache Spark from NVIDIA's engineering team

Articles

Filter:
NVIDIA logo
NVIDIA
Advanced
The article discusses Project Aether, a tool developed by NVIDIA to facilitate the migration of CPU-based Apache Spark workloads to GPU-accelerated environments on Amazon EMR.
Navin Kumar
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the collaboration between IBM and NVIDIA to enhance large-scale data analytics through GPU-native Velox and NVIDIA cuDF, highlighting significant performance improvements over...
Gregory Kimball
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the deployment of a serverless, distributed data processing architecture using Apache Spark and NVIDIA AI on Azure.
NVIDIA logo
NVIDIA
Intermediate
The article discusses the application of Graph Neural Networks (GNNs) in enhancing fraud detection within financial services.
NVIDIA logo
NVIDIA
Advanced
The article discusses how Atgenomix SeqsLab leverages NVIDIA technologies to enhance health omics analysis for precision medicine.
Yu-Ting Lin
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the use of GPU acceleration to enhance performance in Apache Spark applications, highlighting the challenges of migrating workloads from CPUs to GPUs.
Matt Ahrens
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how to accelerate Deep Learning (DL) and Large Language Model (LLM) inference using Apache Spark in cloud environments.
NVIDIA logo
NVIDIA
Advanced
The article discusses how to accelerate Apache Parquet scans on Apache Spark using GPUs, specifically through the RAPIDS Accelerator for Apache Spark.
Matt Ahrens
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses strategies for preventing GPU fragmentation in the Volcano Scheduler, focusing on an enhanced scheduling approach that integrates bin-packing with gang scheduling.
Ameya Parab
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the performance and energy efficiency of the NVIDIA Grace CPU Superchip for ETL workloads, comparing it with AMD and Intel CPUs.
Gregory Kimball
6 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how the NVIDIA RAPIDS Accelerator for Apache Spark enables zero code change for GPU-accelerated data processing, enhancing the performance of Apache Spark ML applications.
Erik Ordentlich
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how to read JSON Lines data using NVIDIA's cuDF library, achieving performance improvements of up to 100 times faster than traditional pandas methods.
Karthikeyan Natarajan
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the optimization of JSON processing on Apache Spark using GPU acceleration, highlighting significant performance improvements achieved by a Fortune 100 retail company.
Matt Ahrens
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
NVIDIA has announced that its CUDA-X platform now accelerates the Polars Data Processing Library, enhancing its performance for data analytics.
Nick Becker
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the NVIDIA GH200 Grace Hopper Superchip, highlighting its significant advancements in energy efficiency and node consolidation for Apache Spark workloads.
NVIDIA logo
NVIDIA
Intermediate
The article introduces five new technical courses offered by NVIDIA aimed at enhancing skills in AI and data science.
NVIDIA logo
NVIDIA
Intermediate
This article provides a comprehensive guide on leveraging RAPIDS for GPU-accelerated data processing on Databricks.
Sheilah Kirui
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the integration of RAPIDS and Vadalog Parallel to enhance the performance of neurosymbolic AI systems, particularly in processing large knowledge graphs.
Bruno Trentini
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the Spark RAPIDS ML library, an open-source Python package that accelerates Apache Spark ML applications using NVIDIA GPU technology.
Erik Ordentlich
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the optimization of Extract-Transform-Load (ETL) operations using GPUs, specifically through the NVIDIA RAPIDS Accelerator for Apache Spark.
Joel Lashmore
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how the NVIDIA RAPIDS Accelerator for Apache Spark can significantly enhance the performance and cost-effectiveness of extract-transform-load (ETL) processes, particularly for...
NVIDIA logo
NVIDIA
Advanced
The article discusses the integration of distributed deep learning with Apache Spark 3. 4, highlighting new built-in APIs for both distributed model training and inference.
NVIDIA logo
NVIDIA
Advanced
The article discusses how Taboola integrated GPUs into their data processing pipeline to enhance efficiency and reduce costs.
Eyal Hirsch
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the introduction of Spark RAPIDS ML, a new GPU-accelerated library for Apache Spark ML that enhances the performance and cost-effectiveness of machine learning applications.
Erik Ordentlich
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the integration of Dataiku and NVIDIA technologies for deep learning applications, particularly in image classification and topic modeling.
Shashank Gaur
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses NVIDIA's next-generation computing platforms optimized for AI, video, and data analytics performance.
NVIDIA logo
NVIDIA
Advanced
At NVIDIA GTC 2023, NVIDIA showcased significant updates to its AI software suite aimed at accelerating computing across various domains.
NVIDIA logo
NVIDIA
Intermediate
The article discusses NVIDIA AI Enterprise 3. 1, highlighting its role in accelerating enterprise adoption of AI through a comprehensive suite of tools and frameworks.
NVIDIA logo
NVIDIA
Intermediate
The article discusses how retailers can enhance their data analytics capabilities using GPU-accelerated Apache Spark workloads on Google Cloud Dataproc.
Saurav Agarwal
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how NVIDIA's RAPIDS cuDF can significantly accelerate data analytics workflows, particularly in exploratory data analysis (EDA).
Prachi Goel
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how RAPIDS cuDF can significantly accelerate time series data analysis, providing speed improvements of up to 40x compared to traditional pandas workflows.
Prachi Goel
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses the architecture and implementation of recommendation systems using NVIDIA Merlin and Redis, focusing on offline and online systems.
NVIDIA logo
NVIDIA
Intermediate
The article discusses how organizations can reduce costs and improve performance in big data processing using Apache Spark on Google Cloud Dataproc with the RAPIDS Accelerator.
Karthikeyan Rajendran
8 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
NVIDIA has announced significant updates to its AI software suite, including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS, aimed at accelerating AI research, computer vision, and data science.
NVIDIA logo
NVIDIA
Intermediate
The article discusses how AT&T leveraged GPUs to optimize their data pipelines, focusing on improving speed, cost, and efficiency across various stages of the data-to-AI pipeline.
Mark Austin
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article evaluates the roles of data lakes and data warehouses as repositories for machine learning data, discussing their respective advantages and disadvantages.
Judy McConnell
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the critical need for effective fraud prevention strategies in enterprise IT, emphasizing the role of AI and big data analytics.
André Franklin
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the optimization of enterprise IT workloads using NVIDIA-Certified Systems, addressing the challenges of selecting appropriate hardware for GPU-accelerated applications.
NVIDIA logo
NVIDIA
Intermediate
The article provides an overview of the upcoming GTC event, highlighting key sessions focused on Cybersecurity, Data Center, Data Science, and Networking.
Michelle Horton
5 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the implementation of high-precision decimal arithmetic using CUDA's int128 support, highlighting the limitations of floating-point arithmetic in applications requiring exact ...
Conor Hoekstra
19 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
RAPIDS Accelerator for Apache Spark v21. 10 introduces significant performance improvements and new functionalities tailored for GPU acceleration, responding to community requests.
Karthikeyan Rajendran
4 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article highlights key data science sessions at the NVIDIA GTC conference, showcasing innovative approaches and technologies in the field.
Jacob Schmitt
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Beginner
The RAPIDS Accelerator for Apache Spark v21.
Eric Rife
5 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses a novel approach to analyzing data stored in Apache Cassandra using GPU acceleration through the RAPIDS ecosystem.
NVIDIA logo
NVIDIA
Intermediate
The article discusses the integration of NVIDIA T4 Tensor Core GPUs with Azure Synapse Analytics to enhance data processing and machine learning tasks using GPU acceleration.
Alexander Spiridonov
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The RAPIDS Accelerator for Apache Spark v21. 06 release introduces significant enhancements, including support for Apache Spark version 3. 1.
Saloni Jain
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses how to accelerate sequential Python User-Defined Functions (UDFs) using RAPIDS on GPUs, achieving speedups of up to 100x.
Vibhu Jawa
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article is the third part of a series focused on an end-to-end blueprint for predicting customer churn using machine learning.
William Benton
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article is the second part of a series detailing an end-to-end blueprint for customer churn modeling and prediction.
William Benton
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses how to leverage RAPIDS, HuggingFace, and Dask to run state-of-the-art NLP workloads at scale on GPUs.
Vibhu Jawa
7 min read
Includes Code
Has Summary
--