NVIDIA logo

How NVIDIA Uses Dask

76 engineering articles about Dask from NVIDIA's engineering team

Articles

Filter:
NVIDIA logo
NVIDIA
Advanced
The article discusses the importance of community detection algorithms, particularly the Leiden algorithm, in analyzing large-scale graph data using GPU acceleration via cuGraph.
Rick Ratzel
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the advancements in XGBoost 3. 0, particularly its ability to train with terabyte-scale datasets on a single NVIDIA Grace Hopper Superchip.
Dante Gama Dessavre
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses strategies for processing large datasets that exceed GPU VRAM using the Polars GPU engine, specifically focusing on Unified Virtual Memory (UVM) and multi-GPU streaming execu...
Jamil Semaan
4 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the advancements in single-cell analysis facilitated by RAPIDS-singlecell, an open-source tool that leverages GPU acceleration to handle large datasets efficiently.
TJ Chen
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the latest enhancements in RAPIDS, including zero-code-change acceleration for Python machine learning, significant IO performance improvements, and out-of-core XGBoost capabi...
NVIDIA logo
NVIDIA
Advanced
The article discusses the development of the Nemotron-CC dataset, a high-quality trillion-token dataset for pretraining large language models (LLMs) using Common Crawl data.
Nirmal Kumar Juluru
7 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses optimizing high-performance remote I/O operations using NVIDIA KvikIO for data analysis workloads on cloud object storage services.
Tom Augspurger
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the significance of high-quality data in enhancing the accuracy of generative AI models, focusing on the capabilities of NVIDIA NeMo Curator for data curation and processing.
Nirmal Kumar Juluru
5 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how to accelerate GPU analytics using RAPIDS and Ray, two powerful frameworks for distributed data science and AI applications.
Peter Entschev
4 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the introduction of new NVIDIA NeMo Curator classifier models that enhance training data quality for generative AI.
Tom Balough
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The NVIDIA Deep Learning Institute has launched the Accelerated Data Science Teaching Kit, aimed at educators to enhance data science education.
NVIDIA logo
NVIDIA
Advanced
The article discusses best practices for multi-GPU data analysis using RAPIDS with Dask, emphasizing the need for efficient memory management and accelerated networking.
Ben Zaitlen
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses the use of NVIDIA NeMo Curator for processing high-quality Vietnamese language data, highlighting the challenges faced by large language models (LLMs) in non-English language...
Hoang Nguyen
16 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses techniques for processing text data to optimize the performance of Large Language Models (LLMs).
Amit Bleiweiss
13 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the process of streamlining data processing for Domain Adaptive Pretraining (DAPT) of large language models (LLMs) using NVIDIA NeMo Curator.
Mehran Maghoumi
16 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how to curate custom datasets for parameter-efficient fine-tuning of large language models (LLMs) using NVIDIA NeMo Curator.
Mehran Maghoumi
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the importance of data curation in training large language models (LLMs), particularly for low-resourced languages.
Arham Mehta
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the importance of data curation in training large language models (LLMs) and introduces NVIDIA NeMo Curator, an open-source framework designed for creating high-quality datase...
Mehran Maghoumi
14 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article provides a comprehensive guide on leveraging RAPIDS for GPU-accelerated data processing on Databricks.
Sheilah Kirui
10 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses the NVIDIA NeMo Curator framework, an open-source tool designed to streamline the data curation process for training large language models (LLMs).
Mehran Maghoumi
6 min read
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how to optimize multi-GPU model training using Dask and XGBoost, addressing common challenges such as out-of-memory errors.
Jiwei Liu
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses how NVIDIA NeMo can streamline the development of generative AI applications on GPU-accelerated Google Cloud.
NVIDIA logo
NVIDIA
Intermediate
This article provides insights into building multilingual recommender systems, focusing on a two-stage candidate reranker approach.
Chris Deotte
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses NVIDIA NeMo, an end-to-end platform designed to facilitate the development and deployment of enterprise-ready large language models (LLMs).
Amanda Saunders
9 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article introduces the NVIDIA NeMo Data Curator, a scalable tool designed for curating trillion-token multilingual datasets for training large language models (LLMs).
Joseph Jennings
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses the challenges of debugging in a mixed Python and C language stack, particularly in the context of the RAPIDS project.
Peter Entschev
18 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how NVIDIA's RAPIDS cuDF can significantly accelerate data analytics workflows, particularly in exploratory data analysis (EDA).
Prachi Goel
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how RAPIDS cuDF can significantly accelerate time series data analysis, providing speed improvements of up to 40x compared to traditional pandas workflows.
Prachi Goel
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
This article discusses the optimization of hash maps for GPU acceleration, focusing on their memory access patterns and performance benefits.
Daniel Juenger
18 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Beginner
This article focuses on the practical aspects of building and training a machine learning (ML) model using Python, specifically utilizing the Iris Dataset.
Kurtis Pykes
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how NVIDIA's cuCIM and GPUDirect Storage can significantly enhance digital pathology workflows by improving input/output performance and image processing tasks.
Gregory Lee
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how Graph Neural Networks (GNNs) and NVIDIA GPUs can optimize fraud detection in financial services.
NVIDIA logo
NVIDIA
Advanced
NVIDIA has announced significant updates to its AI software suite, including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS, aimed at accelerating AI research, computer vision, and data science.
NVIDIA logo
NVIDIA
Advanced
The article discusses how to accelerate ETL processes on KubeFlow using RAPIDS, a data science framework that leverages GPUs for improved performance.
Jacob Tomlinson
12 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the advantages of using Naive Bayes (NB) classifiers for text classification tasks, particularly when leveraging GPU acceleration through RAPIDS cuML.
Mickael Ide
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the optimization of accessing Parquet data using the fsspec library, particularly through the new fsspec. parquet module.
Rick Zamora
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article provides an overview of the upcoming GTC event, highlighting key sessions focused on Cybersecurity, Data Center, Data Science, and Networking.
Michelle Horton
5 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article introduces the foundational techniques for preparing text data for Natural Language Processing (NLP) using vectorization, hashing, and tokenization.
Edward Krueger
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses how to accelerate portfolio construction algorithms using Numba and Dask in Python, achieving up to 800x speed improvements on GPUs.
Yi Dong
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how Munich Re Markets leverages interpretable machine learning to enhance portfolio construction strategies in the Life and Pension industry.
Jochen Papenbrock
10 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses the analysis of RNA sequencing data from 1. 3 million mouse brain cells using RAPIDS on NVIDIA GPUs.
Corey Nolet
7 min read
Has Summary
--
NVIDIA logo
NVIDIA
Beginner
The article discusses how to leverage NVIDIA GPUs and the Saturn Cloud platform to accelerate data science workflows using RAPIDS.
Jacob Schmitt
8 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Advanced
The article discusses the input and output configurability of the RAPIDS cuML machine learning library, highlighting its support for various data formats and the benefits of using GPU memory for pe...
Dante Gama Dessavre
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how to accelerate XGBoost on GPU clusters using Dask, highlighting the new Dask interface introduced in XGBoost 1. 4.
Belen Tegegn
11 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
This article discusses how to accelerate sequential Python User-Defined Functions (UDFs) using RAPIDS on GPUs, achieving speedups of up to 100x.
Vibhu Jawa
5 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses how accelerated data science can enhance data analytics workflows by leveraging NVIDIA technologies, significantly improving performance and reducing costs.
Chase Hooley
3 min read
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses UCX-Py, an accelerated networking library that enhances communication performance for Python applications, particularly in the context of GPU and distributed computing.
Belen Tegegn
9 min read
Includes Code
Has Summary
--
NVIDIA logo
NVIDIA
Intermediate
The article discusses on-demand technical sessions from GTC '21 that focus on developing and deploying AI solutions in the cloud using NVIDIA NGC.
Chintan Patel
2 min read
Has Summary
--