How NVIDIA Uses Apache

145 engineering articles about Apache from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Other Companies Using Apache

Articles

Filter:

NVIDIA

Advanced

Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether

The article discusses Project Aether, a tool developed by NVIDIA to facilitate the migration of CPU-based Apache Spark workloads to GPU-accelerated environments on Amazon EMR.

ApacheApache SparkAWSXGBoost

Navin Kumar

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

NVIDIA CUDA-X Powers the New Sirius GPU Engine for DuckDB, Setting ClickBench Records

NVIDIA's Sirius, an open-source GPU-native SQL engine, has set a new performance record on ClickBench, enhancing DuckDB with GPU-accelerated analytics.

ApacheApache ArrowAWSSQL

Xiangyao Yu

6 min read

Has Summary

NVIDIA

Intermediate

How to Train Scientific Agents with Reinforcement Learning

The article discusses the development of scientific AI agents using reinforcement learning (RL) techniques, specifically through the NVIDIA NeMo framework.

ApacheAzurePythonReinforcement LearningRLHF

Christian Munley

12 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Accelerating Large-Scale Data Analytics with GPU-Native Velox and NVIDIA cuDF

The article discusses the collaboration between IBM and NVIDIA to enhance large-scale data analytics through GPU-native Velox and NVIDIA cuDF, highlighting significant performance improvements over...

ApacheApache SparkSQL

Gregory Kimball

7 min read

Has Summary

NVIDIA

Intermediate

Train a Quadruped Locomotion Policy and Simulate Cloth Manipulation with NVIDIA Isaac Lab and Newton

This article discusses the integration of the Newton physics engine with NVIDIA Isaac Lab for training quadruped locomotion policies and simulating cloth manipulation.

ApacheNumPyPythonPyTorchReinforcement LearningWarpYAML

Mohammad Mohajerani

13 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Optimizing Vector Search for Indexing and Real-Time Retrieval with NVIDIA cuVS

The article discusses the advancements in NVIDIA cuVS, a GPU-accelerated vector search library designed for high-performance indexing and low-latency retrieval.

ApacheElasticsearchGoogle CloudJavaOraclePythonRustscikit-learnVertex AI

Corey Nolet

7 min read

Has Summary

NVIDIA

Advanced

Serverless Distributed Data Processing with Apache Spark and NVIDIA AI on Azure

The article discusses the deployment of a serverless, distributed data processing architecture using Apache Spark and NVIDIA AI on Azure.

ApacheApache SparkAzureDockerEmbeddingHTTPSHugging FacePythonREST APIServerlessSQLSQL Server

Alexander Spiridonov

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerated Molecular Modeling with NVIDIA cuEquivariance and NVIDIA NIM microservices

The article discusses NVIDIA's advancements in molecular AI modeling through the introduction of cuEquivariance and NIM microservices, which enhance the speed and efficiency of training and inferen...

ApachePyTorchTransformerTransformers

Neha Tadimeti

8 min read

Has Summary

NVIDIA

Advanced

Accelerate Decision Optimization Using Open Source NVIDIA cuOpt

The article discusses how NVIDIA cuOpt, an open-source GPU-accelerated optimization tool, enhances decision-making processes in businesses by efficiently solving complex linear programming (LP), mi...

ApacheDockerJSONPythonREST API

Gordana Neskovic

5 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Supercharging Fraud Detection in Financial Services with Graph Neural Networks (Updated)

The article discusses the application of Graph Neural Networks (GNNs) in enhancing fraud detection within financial services.

ApacheApache SparkDockerGraph Neural NetworksJSONKubernetesNeural NetworksXGBoost

Naim

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

RAPIDS Brings Zero-Code-Change Acceleration, IO Performance Gains, and Out-of-Core XGBoost

The article discusses the latest enhancements in RAPIDS, including zero-code-change acceleration for Python machine learning, significant IO performance improvements, and out-of-core XGBoost capabi...

ApacheAzureAzure Blob StorageDaskGeminiGoogle CloudGoogle Cloud StorageLightGBMNetworkXPolarsPythonscikit-learnXGBoost

Nick Becker

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Spotlight: Atgenomix SeqsLab Scales Health Omics Analysis for Precision Medicine

The article discusses how Atgenomix SeqsLab leverages NVIDIA technologies to enhance health omics analysis for precision medicine.

ApacheApache SparkAzureSQLXGBoost

Yu-Ting Lin

9 min read

Has Summary

NVIDIA

Advanced

Predicting Performance on Apache Spark with GPUs

The article discusses the use of GPU acceleration to enhance performance in Apache Spark applications, highlighting the challenges of migrating workloads from CPUs to GPUs.

ApacheApache SparkAWSAzureJSONMachine LearningOptunaSHAPSQLXGBoost

Matt Ahrens

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

The article discusses how to accelerate Deep Learning (DL) and Large Language Model (LLM) inference using Apache Spark in cloud environments.

ApacheApache SparkAWSAzureDeep LearningDockerJSONNumPyPythonPyTorchSemantic SearchTensorFlowTransformers

Rishi Chandra

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Revolutionizing Neural Reconstruction and Rendering in gsplat with 3DGUT

The article discusses the integration of the 3D Gaussian Unscented Transform (3DGUT) into the gsplat library, enhancing neural rendering and scene reconstruction for realistic 3D simulations.

ApachePythonPyTorch

Ruilong Li

5 min read

Has Summary

NVIDIA

Intermediate

NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support

NVIDIA cuPyNumeric 25. 03 is a fully open-source library designed as a drop-in replacement for NumPy, leveraging the Legate framework for accelerated computing.

ApacheNumPyPython

Bo Dong

4 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerating Apache Parquet Scans on Apache Spark with GPUs

The article discusses how to accelerate Apache Parquet scans on Apache Spark using GPUs, specifically through the RAPIDS Accelerator for Apache Spark.

ApacheApache SparkSQL

Matt Ahrens

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

NVIDIA Open Sources Run:ai Scheduler to Foster Community Collaboration

NVIDIA has open-sourced the KAI Scheduler, a Kubernetes-native GPU scheduling solution under the Apache 2. 0 license, originally developed for the Run:ai platform.

ApacheKubernetesPyTorchTensorFlow

Ronen Dar

9 min read

Has Summary

NVIDIA

Advanced

Practical Tips for Preventing GPU Fragmentation for Volcano Scheduler

This article discusses strategies for preventing GPU fragmentation in the Volcano Scheduler, focusing on an enhanced scheduling approach that integrates bin-packing with gang scheduling.

ApacheApache SparkKubernetes

Ameya Parab

6 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Efficient ETL with Polars and Apache Spark on NVIDIA Grace CPU

The article discusses the performance and energy efficiency of the NVIDIA Grace CPU Superchip for ETL workloads, comparing it with AMD and Intel CPUs.

ApacheApache SparkPolarsPythonRapids

Gregory Kimball

6 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Accelerate Apache Spark ML on NVIDIA GPUs with Zero Code Change

The article discusses how the NVIDIA RAPIDS Accelerator for Apache Spark enables zero code change for GPU-accelerated data processing, enhancing the performance of Apache Spark ML applications.

ApacheApache SparkAWSPandasPySparkPythonSQL

Erik Ordentlich

5 min read

Includes Code

Has Summary

NVIDIA

Intermediate

High-Performance Remote IO With NVIDIA KvikIO

The article discusses optimizing high-performance remote I/O operations using NVIDIA KvikIO for data analysis workloads on cloud object storage services.

ApacheAWSAzureAzure Blob StorageDaskGoogle CloudGoogle Cloud StoragePython

Tom Augspurger

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

JSON Lines Reading with pandas 100x Faster Using NVIDIA cuDF

The article discusses how to read JSON Lines data using NVIDIA's cuDF library, achieving performance improvements of up to 100 times faster than traditional pandas methods.

ApacheApache ArrowApache SparkDockerJSONPython

Karthikeyan Natarajan

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Spotlight: BRLi and Toulouse INP Develop AI-Based Flood Models Using NVIDIA PhysicsNeMo

The article discusses the collaboration between BRLi and Toulouse INP to develop AI-based flood models using NVIDIA PhysicsNeMo, addressing the limitations of traditional physics-based numerical si...

Apache

Ram Cherukuri

6 min read

Has Summary

NVIDIA

Intermediate

Accelerating JSON Processing on Apache Spark with GPUs

The article discusses the optimization of JSON processing on Apache Spark using GPU acceleration, highlighting significant performance improvements achieved by a Fortune 100 retail company.

ApacheApache SparkJSONSQL

Matt Ahrens

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

IBM’s New Granite 3.0 Generative AI Models Are Small, Yet Highly Accurate and Efficient

IBM has launched Granite 3. 0, a new generation of generative AI models that are compact yet deliver high accuracy and efficiency.

ApacheGenerative AIMistral

Maryam Ashoori

5 min read

Has Summary

NVIDIA

Intermediate

NVIDIA CUDA-X Now Accelerates the Polars Data Processing Library

NVIDIA has announced that its CUDA-X platform now accelerates the Polars Data Processing Library, enhancing its performance for data analytics.

ApacheApache SparkPolars

Nick Becker

3 min read

Has Summary

NVIDIA

Advanced

Accelerating Predictive Maintenance in Manufacturing with RAPIDS AI

The article discusses how RAPIDS AI can accelerate predictive maintenance in manufacturing by leveraging advanced data analytics to minimize downtime and optimize maintenance schedules.

ApacheApache ArrowAzurePandasPythonscikit-learn

Amarnath Mohan

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

NVIDIA GH200 Superchip Delivers Breakthrough Energy Efficiency and Node Consolidation for Apache Spark

The article discusses the NVIDIA GH200 Grace Hopper Superchip, highlighting its significant advancements in energy efficiency and node consolidation for Apache Spark workloads.

ApacheApache SparkDeep LearningMachine LearningSQLVultr

Amr Elmeleegy

7 min read

Has Summary

NVIDIA

Advanced

Power Text-Generation Applications with Mistral NeMo 12B Running on a Single GPU

The article discusses the Mistral NeMo 12B model, a next-generation language model developed by NVIDIA and Mistral, designed for high performance on a single GPU.

ApacheArtificial IntelligenceEmbeddingMistralPyTorchRLHFTransformer

Anjali Shah

6 min read

Includes Code

Has Summary

NVIDIA

Beginner

Encoding and Compression Guide for Parquet String Data Using RAPIDS

This article provides a comprehensive guide on encoding and compression techniques for string data in the Parquet format using RAPIDS.

ApacheApache ArrowDockerJSONPandasPython

Gregory Kimball

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Level Up Your Skills with Five New NVIDIA Technical Courses

The article introduces five new technical courses offered by NVIDIA aimed at enhancing skills in AI and data science.

ApacheApache ArrowApache SparkComputer VisionNatural Language ProcessingPrompt EngineeringPyTorchTransformerTransformersXGBoost

Rachel Ho

4 min read

Has Summary

NVIDIA

Intermediate

RAPIDS on Databricks: A Guide to GPU-Accelerated Data Processing

This article provides a comprehensive guide on leveraging RAPIDS for GPU-accelerated data processing on Databricks.

ApacheApache SparkDaskPythonRapidsSQLXGBoost

Sheilah Kirui

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Using Graph Neural Networks for Additive Manufacturing

The article discusses the application of Graph Neural Networks (GNNs) in optimizing the design and simulation of lattice structures in additive manufacturing.

ApacheGraph Neural NetworksNeural Networks

Ayush Jain

6 min read

Has Summary

NVIDIA

Intermediate

New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model

The article discusses the release of the NVIDIA NeMo Canary model, a state-of-the-art multilingual model for speech recognition and translation.

ApacheCythonGradioPyTorchWhisper

Elena Rastorgueva

4 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Scale and Curate High-Quality Datasets for LLM Training with NVIDIA NeMo Curator

The article discusses the NVIDIA NeMo Curator framework, an open-source tool designed to streamline the data curation process for training large language models (LLMs).

ApacheDaskHugging FaceJSON

Mehran Maghoumi

6 min read

Has Summary

NVIDIA

Intermediate

Evaluating Retriever for Enterprise-Grade RAG

The article discusses the evaluation of Retrieval-Augmented Generation (RAG) systems, emphasizing the importance of embedding models and systematic evaluation processes.

ApacheEmbeddingHugging FaceLarge Language Models

Benedikt Schifferer

14 min read

Has Summary

NVIDIA

Intermediate

Streamline ETL Workflows with Nested Data Types in RAPIDS libcudf

The article discusses the use of nested data types in RAPIDS libcudf for optimizing ETL workflows.

ApacheApache ArrowDockerJSONPython

Gregory Kimball

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Deploy Large Language Models at the Edge with NVIDIA IGX Orin Developer Kit

The article discusses deploying large language models (LLMs) at the edge using the NVIDIA IGX Orin Developer Kit.

ApacheDeep LearningGradioHaystackHugging FaceLangChainLarge Language ModelsOobaboogaPython

Nigel Nelson

9 min read

Has Summary

NVIDIA

Advanced

Enabling Greater Patient-Specific Cardiovascular Care with AI Surrogates

A Stanford University team is revolutionizing cardiovascular care through AI-driven simulations that provide patient-specific blood flow visualizations.

ApacheGraph Neural NetworksNeural Networks

Harpreet Sethi

8 min read

Has Summary

NVIDIA

Advanced

Accelerating Neurosymbolic AI with RAPIDS and Prometheux Vadalog Parallel

The article discusses the integration of RAPIDS and Vadalog Parallel to enhance the performance of neurosymbolic AI systems, particularly in processing large knowledge graphs.

ApacheApache SparkJavaJSONNeo4jSQL

Bruno Trentini

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Reduce Apache Spark ML Compute Costs with New Algorithms in Spark RAPIDS ML Library

The article discusses the Spark RAPIDS ML library, an open-source Python package that accelerates Apache Spark ML applications using NVIDIA GPU technology.

ApacheApache SparkAWSPySparkPythonscikit-learn

Erik Ordentlich

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

GPUs for ETL? Optimizing ETL Architecture for Apache Spark SQL Operations

The article discusses the optimization of Extract-Transform-Load (ETL) operations using GPUs, specifically through the NVIDIA RAPIDS Accelerator for Apache Spark.

ApacheApache SparkAzureJavaLessSQL

Joel Lashmore

8 min read

Has Summary

NVIDIA

Intermediate

How to Use 3D Geospatial Data for Immersive Environments with Cesium

The article discusses the use of 3D geospatial data in immersive environments, specifically through the Cesium platform.

ApachePython

Paul Cutsinger

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks

The article discusses how the NVIDIA RAPIDS Accelerator for Apache Spark can significantly enhance the performance and cost-effectiveness of extract-transform-load (ETL) processes, particularly for...

ApacheApache SparkArtificial IntelligenceAzureJavaLessSQL

Joel Lashmore

7 min read

Has Summary

NVIDIA

Intermediate

Accelerated Data Analytics: Machine Learning with GPU-Accelerated Pandas and Scikit-learn

The article discusses how GPU-accelerated data analytics can enhance machine learning (ML) projects by improving speed and scalability.

ApacheApache ArrowLightGBMMachine LearningPandasPythonscikit-learnXGBoost

Jay Rodge

14 min read

Includes Code

Has Summary

NVIDIA

Advanced

Distributed Deep Learning Made Easy with Spark 3.4

The article discusses the integration of distributed deep learning with Apache Spark 3. 4, highlighting new built-in APIs for both distributed model training and inference.

ApacheApache ArrowApache SparkDeep LearningHugging FaceNumPyPandasPySparkPythonPyTorchTensorFlow

Lee Yang

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Develop Physics-Informed Machine Learning Models with Graph Neural Networks

The article discusses NVIDIA PhysicsNeMo, a framework for developing physics-informed machine learning models, with a focus on the latest update that introduces support for Graph Neural Networks (G...

ApacheDeep LearningGraph Neural NetworksMachine LearningNeural NetworksPyTorch

Bhoomi Gadhia

5 min read

Has Summary

NVIDIA

Advanced

GPU Integration Propels Data Center Efficiency and Cost Savings for Taboola

The article discusses how Taboola integrated GPUs into their data processing pipeline to enhance efficiency and reduce costs.

ApacheApache KafkaApache SparkJavaKubernetesSQL

Eyal Hirsch

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Why Automatic Augmentation Matters

The article discusses the importance of automatic augmentation in deep learning, emphasizing its role in enhancing model accuracy by diversifying training datasets.

ApacheDeep LearningPythonPyTorchResNetTensorFlow

Kamil Tokarski

12 min read

Includes Code

Has Summary