XGBoost Programming Tutorials & Engineering Articles

127 XGBoost tutorials, guides, and engineering insights from NVIDIA, Uber, LinkedIn, and more

Companies Using This

NVIDIA(79)

XGBoost Articles & Tutorials

Filter:

NVIDIA

Advanced

Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether

The article discusses Project Aether, a tool developed by NVIDIA to facilitate the migration of CPU-based Apache Spark workloads to GPU-accelerated environments on Amazon EMR.

ApacheApache SparkAWSXGBoost

Navin Kumar

6 min read

Includes Code

Has Summary

Shopify

Intermediate

Tangle: An open-source ML experimentation platform built at Shopify scale

Shopify open-sources Tangle, an ML experimentation platform built to solve six common failure modes in machine learning development.

DockerJavaJavaScriptRubyRustShellSQLiteTensorFlowXGBoostYAML

Shopify Engineering

12 min read

Has Summary

Uber

Advanced

Enhancing Uber’s Guidance Heatmap with Deep Probabilistic Models

The article discusses how Uber enhanced its Guidance Heatmap using deep probabilistic models to provide drivers with better insights into potential earnings.

XGBoost

Bob Zheng, Jane Hung, Arushi Singh, Dhruv Ghulati, Yifan Yu, Paul Frend, Elif Eser

9 min read

Has Summary

NVIDIA

Intermediate

Training XGBoost Models with GPU-Accelerated Polars DataFrames

The article discusses the integration of XGBoost with Polars DataFrames, emphasizing the benefits of GPU acceleration for machine learning workflows.

PolarsRustscikit-learnXGBoost

Jiaming Yuan

7 min read

Includes Code

Has Summary

Uber

Advanced

Enabling Deep Model Explainability with Integrated Gradients at Uber

This article discusses how Uber has integrated explainability into its machine learning platform, Michelangelo, using Integrated Gradients (IG) to provide interpretable attributions for deep learni...

EmbeddingKerasLIMEMachine LearningPyTorchSHAPTensorFlowXGBoostYAML

Hugh Chen, Eric Wang, Gaoyuan Huang, Howard Yu, Jia Li, Sally Lee

14 min read

Has Summary

NVIDIA

Intermediate

How to GPU-Accelerate Model Training with CUDA-X Data Science

This article provides insights into GPU-accelerating machine learning model training using CUDA-X Data Science, focusing on tree-based models like XGBoost, LightGBM, and CatBoost.

CatBoostLightGBMPythonscikit-learnSHAPXGBoost

Divyansh Jain

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

The Kaggle Grandmasters Playbook: 7 Battle-Tested Modeling Techniques for Tabular Data

The article presents a comprehensive playbook developed through extensive experience in Kaggle competitions, detailing seven effective modeling techniques for handling tabular data.

AutoMLCatBoostLightGBMPolarsscikit-learnXGBoost

Kazuki Onodera

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Train with Terabyte-Scale Datasets on a Single NVIDIA Grace Hopper Superchip Using XGBoost 3.0

The article discusses the advancements in XGBoost 3. 0, particularly its ability to train with terabyte-scale datasets on a single NVIDIA Grace Hopper Superchip.

DaskSHAPXGBoost

Dante Gama Dessavre

7 min read

Includes Code

Has Summary

NVIDIA

Intermediate

7 Drop-In Replacements to Instantly Speed Up Your Python Data Science Workflows

This article discusses seven drop-in replacements for popular Python libraries that can significantly speed up data science workflows by leveraging GPU acceleration.

NetworkXPolarsPythonscikit-learnXGBoost

Jamil Semaan

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python

The article discusses the introduction of cuda-cccl, a Python library that provides high-level building blocks for NVIDIA CUDA kernel fusion, enabling developers to write efficient algorithms witho...

LessPythonPyTorchTensorFlowXGBoost

Ashwin Srinath

5 min read

Includes Code

Has Summary

NVIDIA

Advanced

AI in Manufacturing and Operations at NVIDIA: Accelerating ML Models with NVIDIA CUDA-X Data Science

NVIDIA utilizes data science and machine learning to enhance chip manufacturing processes, focusing on optimizing workflows through the use of CUDA-X libraries like cuDF and cuML.

PolarsPythonscikit-learnSHAPXGBoost

Divyansh Jain

8 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Supercharge Tree-Based Model Inference with Forest Inference Library in NVIDIA cuML

The article discusses the enhancements in the Forest Inference Library (FIL) within NVIDIA cuML 25. 04, focusing on its capabilities for fast inference of tree-based models.

LightGBMNumPyPythonscikit-learnXGBoost

Dante Gama Dessavre

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Supercharging Fraud Detection in Financial Services with Graph Neural Networks (Updated)

The article discusses the application of Graph Neural Networks (GNNs) in enhancing fraud detection within financial services.

ApacheApache SparkDockerGraph Neural NetworksJSONKubernetesNeural NetworksXGBoost

Naim

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

RAPIDS Brings Zero-Code-Change Acceleration, IO Performance Gains, and Out-of-Core XGBoost

The article discusses the latest enhancements in RAPIDS, including zero-code-change acceleration for Python machine learning, significant IO performance improvements, and out-of-core XGBoost capabi...

ApacheAzureAzure Blob StorageDaskGeminiGoogle CloudGoogle Cloud StorageLightGBMNetworkXPolarsPythonscikit-learnXGBoost

Nick Becker

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Spotlight: Atgenomix SeqsLab Scales Health Omics Analysis for Precision Medicine

The article discusses how Atgenomix SeqsLab leverages NVIDIA technologies to enhance health omics analysis for precision medicine.

ApacheApache SparkAzureSQLXGBoost

Yu-Ting Lin

9 min read

Has Summary

NVIDIA

Advanced

Predicting Performance on Apache Spark with GPUs

The article discusses the use of GPU acceleration to enhance performance in Apache Spark applications, highlighting the challenges of migrating workloads from CPUs to GPUs.

ApacheApache SparkAWSAzureJSONMachine LearningOptunaSHAPSQLXGBoost

Matt Ahrens

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Kaggle Grandmasters Unveil Winning Strategies for Data Science Superpowers

Kaggle Grandmasters David Austin, Chris Deotte, and Ruchi Bhatia shared insights on their winning strategies for data science competitions at the Google Cloud Next conference.

Google CloudLightGBMMVPscikit-learnXGBoost

Jenn Yonemitsu

9 min read

Has Summary

NVIDIA

Beginner

Grandmaster Pro Tip: Winning First Place in Kaggle Competition with Feature Engineering Using cuDF pandas

The article discusses how feature engineering, particularly using NVIDIA cuDF-pandas for GPU acceleration, can significantly enhance model accuracy in Kaggle competitions involving tabular data.

PandasPythonXGBoost

Chris Deotte

5 min read

Includes Code

Has Summary

Uber

Intermediate

Enhancing Personalized CRM Communication with Contextual Bandit Strategies

This article discusses how Uber enhances personalized CRM communication using contextual bandit strategies, particularly focusing on the application of AI/ML techniques to optimize email content.

EmbeddingGenerative AIGPTMachine LearningXGBoost

LJ (Lin) He, Yifeng Wu, Gaurav Jindal

13 min read

Has Summary

Uber

Advanced

How Uber Uses Ray® to Optimize the Rides Business

The article discusses how Uber utilizes Ray®, a general compute engine for Python®, to enhance the efficiency of its rides business through improved machine learning model performance and optimizat...

ApacheApache SparkAWSDockerKubernetesPandasPySparkXGBoost

Kaichen Wei, Matt Walker, Peng Zhang

15 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Hackathon Winners Share Strategies for RAPIDS-Accelerated ML Workflows

The article discusses the strategies employed by the winners of the NVIDIA hackathon at ODSC West, focusing on how they utilized RAPIDS Python APIs to enhance machine learning workflows.

LightGBMPolarsPythonXGBoost

Jenn Yonemitsu

7 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Security for Data Privacy in Federated Learning with CUDA-Accelerated Homomorphic Encryption in XGBoost

The article discusses the integration of CUDA-accelerated Homomorphic Encryption into Federated XGBoost, enhancing data privacy and security in federated learning environments.

Federated LearningXGBoost

Ziyue Xu

10 min read

Includes Code

Has Summary

NVIDIA

Advanced

Best Practices for Multi-GPU Data Analysis Using RAPIDS with Dask

The article discusses best practices for multi-GPU data analysis using RAPIDS with Dask, emphasizing the need for efficient memory management and accelerated networking.

DaskPandasPythonPyTorchXGBoost

Ben Zaitlen

5 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Faster Causal Inference on Large Datasets with NVIDIA RAPIDS

The article discusses how NVIDIA RAPIDS can enhance causal inference on large datasets by leveraging GPU acceleration, specifically through the integration of the cuML library with the DoubleML fra...

Pythonscikit-learnXGBoost

Nick Becker

4 min read

Includes Code

Has Summary

NVIDIA

Advanced

Federated XGBoost Made Practical and Productive with NVIDIA FLARE

The article discusses the practical implementation of Federated XGBoost using NVIDIA FLARE, highlighting its capabilities for concurrent training, fault tolerance, and experiment tracking.

Federated LearningMLflowPythonTensorBoardXGBoost

Yuan-Ting Hsieh

5 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Level Up Your Skills with Five New NVIDIA Technical Courses

The article introduces five new technical courses offered by NVIDIA aimed at enhancing skills in AI and data science.

ApacheApache ArrowApache SparkComputer VisionNatural Language ProcessingPrompt EngineeringPyTorchTransformerTransformersXGBoost

Rachel Ho

4 min read

Has Summary

NVIDIA

Intermediate

RAPIDS on Databricks: A Guide to GPU-Accelerated Data Processing

This article provides a comprehensive guide on leveraging RAPIDS for GPU-accelerated data processing on Databricks.

ApacheApache SparkDaskPythonRapidsSQLXGBoost

Sheilah Kirui

10 min read

Includes Code

Has Summary

Uber

Advanced

From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey

The article discusses Uber's evolution in machine learning (ML) through its centralized platform, Michelangelo, highlighting its transition from predictive to generative AI.

ApacheApache SparkAutoMLDeep LearningDockerGenerative AIHugging FaceKerasKubernetesPaLMPrompt EngineeringPyTorchTensorFlowXGBoost

Kai Wang, Min Cai, Joseph Wang, Eric Chen

28 min read

Has Summary

NVIDIA

Advanced

Turning Machine Learning to Federated Learning in Minutes with NVIDIA FLARE 2.4

The article discusses the rapid adoption of federated learning (FL) and the new features introduced in NVIDIA FLARE 2. 4.

AWSAzureFederated LearningGPTGraph Neural NetworksgRPCHugging FaceMachine LearningNeural NetworksPyTorchXGBoost

Chester Chen

15 min read

Includes Code

Has Summary

Advanced

Building a Large-Scale Recommendation System: People You May Know

The article discusses the development of LinkedIn's 'People You May Know' (PYMK) recommendation system, detailing its architecture and the challenges faced in scaling its scoring mechanism to handl...

XGBoost

Parag Agrawal

7 min read

Has Summary

NVIDIA

Advanced

Develop ML and AI with Metaflow and Deploy with NVIDIA Triton Inference Server

The article discusses the integration of Metaflow and NVIDIA Triton Inference Server for developing and deploying machine learning models.

AWSFastAPIFine-tuninggRPCHTTPSKubernetesLightGBMLLaMAPythonXGBoost

Eddie Mattia

12 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Accelerating Inference on End-to-End Workflows with H2O.ai and NVIDIA

The article discusses the collaboration between H2O. ai and NVIDIA to enhance AI applications in financial services through generative AI and predictive analytics.

AutoMLGenerative AIGPTH2O.aiRLHFXGBoost

Prabhu Ramamoorthy

13 min read

Has Summary

Spotify

Advanced

Recursive Embedding and Clustering

The article discusses a novel approach to clustering large and diverse datasets by combining dimensionality reduction, recursion, and supervised machine learning.

EmbeddingSHAPXGBoost

Gustavo Pereira

10 min read

Has Summary

Intermediate

Augmenting our content moderation efforts through machine learning and dynamic content prioritization

The article discusses how LinkedIn enhances its content moderation efforts through a new framework that utilizes machine learning for dynamic content prioritization.

Machine LearningXGBoost

Abhishek Chandak

7 min read

Has Summary

Spotify

Intermediate

How We Automated Content Marketing to Acquire Users at Scale

The article discusses Spotify's innovative approach to automating content marketing to efficiently acquire users at scale.

JavaJSONXGBoost

Bryan Maloney (Senior Engineering Manager)

16 min read

Includes Code

Has Summary

NVIDIA

Advanced

Unlocking Multi-GPU Model Training with Dask XGBoost

The article discusses how to optimize multi-GPU model training using Dask and XGBoost, addressing common challenges such as out-of-memory errors.

DaskPythonXGBoost

Jiwei Liu

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Accelerated Data Analytics: Machine Learning with GPU-Accelerated Pandas and Scikit-learn

The article discusses how GPU-accelerated data analytics can enhance machine learning (ML) projects by improving speed and scalability.

ApacheApache ArrowLightGBMMachine LearningPandasPythonscikit-learnXGBoost

Jay Rodge

14 min read

Includes Code

Has Summary

NVIDIA

Advanced

Applying Federated Learning to Traditional Machine Learning Methods

The article discusses the application of federated learning to traditional machine learning methods, highlighting its advantages in communication efficiency and the ability to train models collabor...

Federated LearningMachine Learningscikit-learnXGBoost

Kris Kersten

3 min read

Has Summary

Cloudflare

Advanced

Globally distributed AI and a Constellation update

The article discusses Cloudflare's Constellation, a set of APIs for running low-latency AI inference tasks on their global network.

Cloudflare WorkersMachine LearningXGBoost

Rita Kozlov

7 min read

Has Summary

NVIDIA

Beginner

Predicting Credit Defaults Using Time-Series Models with Recursive Neural Networks and XGBoost

This article discusses the use of time-series models, specifically autoregressive recursive neural networks and XGBoost, for predicting credit defaults.

LightGBMNeural NetworksPyTorchscikit-learnTensorFlowXGBoost

Jiwei Liu

11 min read

Includes Code

Has Summary

Stripe

Advanced

How we built it: Stripe Radar

The article discusses the development of Stripe Radar, a fraud prevention solution that evaluates transactions in real-time to prevent fraud.

ChatGPTElasticsearchXGBoost

Ryan Drapeau

11 min read

Has Summary

Airbnb

Intermediate

Building Airbnb Categories with ML & Human in the Loop

This article discusses the process of building categories for Airbnb listings using a combination of machine learning (ML) and human review.

EmbeddingLarge Language ModelsTransformerXGBoost

Mihajlo Grbovic

13 min read

Has Summary

Shopify

Intermediate

Unlocking Real-time Predictions with Shopify's Machine Learning Platform

The article discusses Shopify's Merlin machine learning platform, focusing on its online inference capabilities for real-time predictions.

ApacheCometDockerFastAPIgRPCHugging FaceKubernetesLightGBMMachine LearningPyTorchTensorFlowXGBoost

Isaac Vidas

15 min read

Has Summary

NVIDIA

Intermediate

Categorical Features in XGBoost Without Manual Encoding

The article discusses the new capability of XGBoost 1. 7 to handle categorical features without manual encoding, which simplifies the training and inference processes for machine learning models.

XGBoost

Chris Jarrett

5 min read

Includes Code

Has Summary

Spotify

Advanced

Unleashing ML Innovation at Spotify with Ray

The article discusses Spotify's evolution in machine learning (ML) infrastructure, emphasizing the integration of Ray to enhance flexibility and scalability for diverse ML practitioners.

KubernetesPyTorchTensorFlowXGBoostYAML

Divita Vohra

13 min read

Includes Code

Has Summary

Uber

Intermediate

How Uber Optimizes the Timing of Push Notifications using ML and Linear Programming

The article discusses how Uber optimizes the timing of push notifications using machine learning and linear programming.

gRPCMySQLXGBoost

Vinay Sharma, Rémi Torracinta, Giacomo Lamberti, Britton Overall

9 min read

Has Summary

NVIDIA

Advanced

Federated Learning from Simulation to Production with NVIDIA FLARE

The article discusses NVIDIA FLARE 2. 2, an open-source platform for federated learning that introduces new features aimed at reducing development time and enhancing deployment efficiency.

DockerFederated LearningHelmNumPyPythonPyTorchXGBoost

Kris Kersten

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Explain Your Machine Learning Model Predictions with GPU-Accelerated SHAP

The article discusses the importance of explainability in machine learning models, particularly through the use of SHAP (SHapley Additive Explanations) and its GPU-accelerated variant, GPUTreeShap.

Artificial IntelligenceLightGBMLIMEMachine LearningPythonSHAPXGBoost

Parul Pandey

14 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Optimizing Fraud Detection in Financial Services with Graph Neural Networks and NVIDIA GPUs

The article discusses how Graph Neural Networks (GNNs) and NVIDIA GPUs can optimize fraud detection in financial services.

AWSDaskDGLGraph Neural NetworksNeural NetworksPythonPyTorchPyTorch GeometricXGBoost

Ashish Sardana

21 min read