How NVIDIA Uses Transformers

88 engineering articles about Transformers from NVIDIA's engineering team

Other NVIDIA Technologies

Python(740)PyTorch(566)Deep Learning(505)TensorFlow(444)Docker(292)Kubernetes(251)

Other Companies Using Transformers

Articles

Filter:

NVIDIA

Advanced

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

The article discusses NVIDIA TensorRT LLM AutoDeploy, a beta feature that automates the inference optimization process for large language models (LLMs).

Hugging FacePyTorchTransformersV

Lucas Liebenwein

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

The article discusses the limitations of current large language models (LLMs) in handling long contexts and introduces Test-Time Training with an end-to-end formulation (TTT-E2E) as a solution.

Neural NetworksRecurrent Neural NetworksTransformerTransformers

Yu Sun

6 min read

Has Summary

NVIDIA

Intermediate

Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM

The article discusses the introduction of NVIDIA TensorRT Edge-LLM, an open-source C++ framework designed for high-performance inference of Large Language Models (LLMs) and Vision Language Models (...

ChiHugging FacePythonTransformers

Lin Chai

5 min read

Includes Code

Has Summary

NVIDIA

Advanced

How to Build a Voice Agent with RAG and Safety Guardrails

This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.

EmbeddingHugging FacePythonTransformerTransformers

Chris Alexiuk

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes

The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...

Hugging FacePyTorchTransformerTransformers

Kyle Tretina

6 min read

Includes Code

Has Summary

NVIDIA

Advanced

Reducing Cold Start Latency for LLM Inference with NVIDIA Run:ai Model Streamer

The article discusses the challenges of cold start latency in deploying large language models (LLMs) and introduces the NVIDIA Run:ai Model Streamer, an open-source Python SDK designed to optimize ...

AWSAWS S3HTTPSHugging FacePythonPyTorchTransformers

Omer Dayan

12 min read

Has Summary

NVIDIA

Advanced

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

The article discusses fine-tuning the gpt-oss model for improved accuracy and performance through Quantization Aware Training (QAT) and Supervised Fine-Tuning (SFT).

GPTHugging FacePyTorchTransformerTransformers

Eduardo Alvarez

7 min read

Includes Code

Has Summary

NVIDIA

Intermediate

NVIDIA Accelerates OpenAI gpt-oss Models Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72

NVIDIA has optimized OpenAI's gpt-oss models for accelerated inference performance on the NVIDIA GB200 NVL72 system, achieving up to 1. 5 million tokens per second (TPS).

DockerHugging FaceOllamaPythonTransformerTransformers

Anu Srivastava

6 min read

Includes Code

Has Summary

NVIDIA

Intermediate

How Early Access to NVIDIA GB200 Systems Helped LMArena Build a Model to Evaluate LLMs

LMArena, in collaboration with NVIDIA and Nebius, has developed the Prompt-to-Leaderboard (P2L) model to evaluate the performance of large language models (LLMs) across various tasks.

Hugging FacePyTorchtorchvisionTransformers

Jason Perlow

6 min read

Has Summary

NVIDIA

Advanced

Accelerated Sequence Alignment for Protein Science with MMseqs2-GPU and NVIDIA NIM

The article discusses the advancements in protein sequence alignment using MMseqs2-GPU and NVIDIA NIM, highlighting their significance in accelerating drug discovery and structural prediction in pr...

Transformers

Kyle Tretina

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerated Molecular Modeling with NVIDIA cuEquivariance and NVIDIA NIM microservices

The article discusses NVIDIA's advancements in molecular AI modeling through the introduction of cuEquivariance and NIM microservices, which enhance the speed and efficiency of training and inferen...

ApachePyTorchTransformerTransformers

Neha Tadimeti

8 min read

Has Summary

NVIDIA

Advanced

Scaling to Millions of Tokens with Efficient Long-Context LLM Training

The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.

TransformerTransformersV

Amit Bleiweiss

7 min read

Has Summary

NVIDIA

Advanced

Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

The article discusses how to accelerate Deep Learning (DL) and Large Language Model (LLM) inference using Apache Spark in cloud environments.

ApacheApache SparkAWSAzureDeep LearningDockerJSONNumPyPythonPyTorchSemantic SearchTensorFlowTransformers

Rishi Chandra

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

LLM Inference Benchmarking: Fundamental Concepts

This article introduces the fundamental concepts of large language model (LLM) inference benchmarking, focusing on key metrics such as throughput and latency.

Generative AITransformers

Vinh Nguyen

14 min read

Has Summary

NVIDIA

Advanced

Understanding the Language of Life’s Biomolecules Across Evolution at a New Scale with Evo 2

The article discusses the advancements in AI-driven biological research with the introduction of Evo 2, a foundation model that integrates genomic, RNA, and protein data across multiple life domain...

AWSFine-tuningJSONTransformerTransformersYAML

Kyle Tretina

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Dynamic Memory Compression

The article discusses Dynamic Memory Compression (DMC), a technology developed by NVIDIA to enhance the efficiency of large language models (LLMs) by adaptively compressing the conversation state.

Natural Language ProcessingTransformerTransformers

Edoardo Maria Ponti

8 min read

Has Summary

NVIDIA

Beginner

NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules

NVIDIA JetPack 6. 2 introduces Super Mode for the Jetson Orin Nano and Jetson Orin NX modules, significantly enhancing generative AI performance.

CLIPHugging FaceOllamaTransformers

Shashank Maheshwari

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

NVIDIA Jetson Orin Nano Developer Kit Gets a “Super” Boost

The article discusses the enhancements made to the NVIDIA Jetson Orin Nano Developer Kit, now renamed the Jetson Orin Nano Super Developer Kit, which offers a performance boost of up to 1.

Generative AIHugging FaceOllamaTransformerTransformers

Suhas Hariharapura Sheshadri

10 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Hymba Hybrid-Head Architecture Boosts Small Language Model Performance

The article discusses NVIDIA's Hymba hybrid-head architecture, which combines transformer attention mechanisms with state space models to enhance the performance and efficiency of small language mo...

EmbeddingHugging FacePyTorchTransformerTransformers

Xin Dong

11 min read

Has Summary

NVIDIA

Advanced

NVIDIA Partners Accelerate Quantum Breakthroughs with AI Supercomputing

NVIDIA is advancing quantum computing through partnerships that integrate AI supercomputing with quantum hardware, aiming to overcome current technological challenges.

Artificial IntelligenceGenerative AIGPTSolidTransformers

Marwa Farag

7 min read

Has Summary

NVIDIA

Advanced

Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo

The article discusses how NVIDIA NeMo has accelerated automatic speech recognition (ASR) models, achieving up to 10x speed improvements through various optimizations.

AWSHugging FacePythonPyTorchTransformersWhisper

Daniel Galvez

12 min read

Includes Code

Has Summary

NVIDIA

Intermediate

New Foundational Models and Training Capabilities with NVIDIA TAO 5.5

The article discusses the release of NVIDIA TAO 5. 5, a framework that simplifies AI model development and deployment.

AutoMLBERTCLIPModalPyTorchResNetTensorFlowTransformerTransformers

Monika Jhuria

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Deploy Multilingual LLMs with NVIDIA NIM

The article discusses the deployment of multilingual large language models (LLMs) using NVIDIA NIM, highlighting the importance of effective communication across languages in a globalized business ...

DockerGenerative AIGitHugging FaceLangChainTransformers

Amit Bleiweiss

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Level Up Your Skills with Five New NVIDIA Technical Courses

The article introduces five new technical courses offered by NVIDIA aimed at enhancing skills in AI and data science.

ApacheApache ArrowApache SparkComputer VisionNatural Language ProcessingPrompt EngineeringPyTorchTransformerTransformersXGBoost

Rachel Ho

4 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Sets New Generative AI Performance and Scale Records in MLPerf Training v4.0

NVIDIA has achieved new generative AI performance records in MLPerf Training v4. 0, showcasing significant advancements in training large language models (LLMs) and graph neural networks (GNNs).

BERTGenerative AIGPTResNetRLHFStable DiffusionTransformerTransformersU-Net

Ashraf Eassa

10 min read

Has Summary

NVIDIA

Intermediate

Accelerating Transformers with NVIDIA cuDNN 9

The article discusses the enhancements made in NVIDIA's cuDNN 9 library, focusing on the acceleration of Transformers through the implementation of Scaled Dot Product Attention (SDPA).

JAXPythonPyTorchTensorFlowTransformerTransformers

Matthew Nicely

11 min read

Includes Code

Has Summary

NVIDIA

Advanced

Tune and Deploy LoRA LLMs with NVIDIA TensorRT-LLM

The article discusses the Low-Rank Adaptation (LoRA) method for fine-tuning large language models (LLMs) using NVIDIA TensorRT-LLM.

Hugging FaceJSONLarge Language ModelsTransformers

Amit Bleiweiss

15 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Applying Mixture of Experts in LLM Architectures

The article discusses the application of Mixture of Experts (MoE) in large language model (LLM) architectures, highlighting its benefits in terms of model capacity, cost efficiency, and latency red...

GPTGPT-4MistralTransformerTransformersV

Kyle Kranen

11 min read

Has Summary

NVIDIA

Intermediate

Deploy an AI Coding Assistant with NVIDIA TensorRT-LLM and NVIDIA Triton

The article provides a comprehensive guide on deploying an AI coding assistant using NVIDIA TensorRT-LLM and NVIDIA Triton.

DockerGitGPTHugging FacePythonTransformerTransformers

Amit Bleiweiss

12 min read

Includes Code

Has Summary

NVIDIA

Advanced

Emulating the Attention Mechanism in Transformer Models with a Fully Convolutional Network

This article discusses the emulation of the attention mechanism in transformer models using a fully convolutional network, specifically targeting improvements in computer vision tasks.

Attention MechanismResNetSelf-AttentionTransformerTransformersV

John Yang

12 min read

Has Summary

NVIDIA

Intermediate

Mastering LLM Techniques: Training

The article discusses the intricacies of training Large Language Models (LLMs) using transformer networks, focusing on model architectures, attention mechanisms, and embedding techniques.

Attention MechanismBERTEmbeddingGPTLarge Language ModelsNeural NetworksRecurrent Neural NetworksSelf-AttentionTransformerTransformersV

Anjali Shah

14 min read

Has Summary

NVIDIA

Intermediate

Bringing Generative AI to Life with NVIDIA Jetson

NVIDIA has introduced the Jetson Generative AI Lab, enabling developers to leverage generative AI capabilities on Jetson edge devices.

CLIPGenerative AIGitHub ActionsGPTGPT-4GradioHugging FaceModalOobaboogaRLHFSegment Anything ModelStable DiffusionTransformers

Chitoku Yato

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Designing Deep Networks to Process Other Deep Networks

The article discusses the design of deep neural networks (DNNs) that can process the weights of other DNNs, focusing on architectures that leverage the symmetries of weight spaces.

Deep LearningGraph Neural NetworksNeural NetworksTransformerTransformersV

Haggai Maron

14 min read

Has Summary

NVIDIA

Advanced

Access the Latest in Vision AI Model Development Workflows with NVIDIA TAO Toolkit 5.0

The article discusses the release of NVIDIA TAO Toolkit 5. 0, which provides a low-code framework for accelerating vision AI model development.

AutoMLAzureCDNGoogle CloudKubernetesResNetTransformerTransformersVertex AI

Chintan Shah

13 min read

Has Summary

NVIDIA

Intermediate

Improve Accuracy and Robustness of Vision AI Apps with Vision Transformers and NVIDIA TAO

The article discusses the transformative impact of Vision Transformers (ViTs) on computer vision applications, highlighting their accuracy, robustness, and adaptability in real-world scenarios.

TransformerTransformers

Debraj Sinha

5 min read

Has Summary

NVIDIA

Intermediate

Research Unveils Breakthrough Deep Learning Tool for Understanding Neural Activity and Movement Control

Columbia University researchers have developed Lightning Pose, a groundbreaking deep learning tool designed to enhance the tracking of animal movement from video.

Deep LearningPyTorchTransformers

Janusz Lisiecki

7 min read

Has Summary

NVIDIA

Intermediate

Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines

The article discusses the structured sparsity feature in the NVIDIA Ampere architecture, particularly focusing on its implementation in deep learning and applications in search engines.

BERTMachine LearningNeural NetworksPythonSelf-AttentionTransformers

Hongxiao Bai

12 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Boost Your AI Workflows with Federated Learning Enabled by NVIDIA FLARE

The article discusses how NVIDIA FLARE 2. 3. 0 enhances AI workflows through federated learning, offering features like multi-cloud support, NLP examples, and split learning.

AWSAzureBERTFederated LearningGenerative AIGPTMachine LearningTransformerTransformers

Isaac Yang

7 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Increasing Inference Acceleration of KoGPT with NVIDIA FasterTransformer

The article discusses the optimization of Kakao Brain's KoGPT large language model using NVIDIA FasterTransformer, highlighting the significant improvements in inference speed and performance.

BERTGPTPyTorchT5TensorFlowTransformerTransformersV

Daemyung Jang

5 min read

Has Summary

NVIDIA

Intermediate

AI Models Recap: Scalable Pretrained Models Across Industries

The article discusses the advancements in AI models, particularly NVIDIA's pretrained models, which have significantly impacted various industries in 2022.

Large Language ModelsTransformerTransformers

Pranjali Joshi

7 min read

Has Summary

NVIDIA

Intermediate

Faster HDBSCAN Soft Clustering with RAPIDS cuML

The article discusses the enhancements in the RAPIDS cuML library for performing HDBSCAN soft clustering, providing significant performance improvements over traditional CPU-based methods.

DockerPyTorchscikit-learnTransformers

Nick Becker

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Making an NVIDIA Riva ASR Service for a New Language

The article discusses the process of creating an NVIDIA Riva Automatic Speech Recognition (ASR) service for a new language, highlighting the components of speech AI systems, the workflow for buildi...

BERTDockerTransformers

Vinh Nguyen

12 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Essential Guide to Automatic Speech Recognition Technology

The article provides a comprehensive overview of Automatic Speech Recognition (ASR) technology, detailing its functionality, algorithms, and applications across various industries.

AWSBERTTransformers

Sirisha Rella

10 min read

Has Summary

NVIDIA

Advanced

Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server

The article discusses the NVIDIA Triton Inference Server and its FasterTransformer library, which enables accelerated inference for large transformer models.

BERTGPTJSONPyTorchT5TensorFlowTransformerTransformers

Denis Timonin

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

How MONAI Fuels Open Research for Medical AI Workflows

The article discusses how MONAI, the Medical Open Network for AI, empowers medical researchers by providing an open-source framework for developing AI workflows in healthcare.

AutoMLAWSGoogle CloudModalPyTorchTransformers

Prerna Dogra

5 min read

Has Summary

NVIDIA

Advanced

The Full Stack Optimization Powering NVIDIA MLPerf Training v2.0 Performance

The article discusses NVIDIA's advancements in MLPerf Training v2. 0, highlighting the full-stack optimizations that enhance performance across various AI workloads.

BERTNatural Language ProcessingPythonPyTorchReinforcement LearningResNetTransformersU-Net

Ashraf Eassa

14 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Transformers4Rec: Building Session-Based Recommendations with an NVIDIA Merlin Library

The article introduces Transformers4Rec, a library from NVIDIA Merlin designed for building session-based recommendation systems using state-of-the-art Transformer architectures.

Hugging FaceKerasNeural NetworksPyTorchRecurrent Neural NetworksTensorFlowTransformerTransformers

Ronay AK

7 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Novel Transformer Model Achieves State-of-the-Art Benchmarks in 3D Medical Image Analysis

The article discusses the Swin UNETR, a novel transformer model designed for 3D medical image analysis, which has achieved state-of-the-art benchmarks in various segmentation tasks.

AutoMLComputer VisionPyTorchTransformerTransformersVault

Ali Hatamizadeh

5 min read

Has Summary

NVIDIA

Advanced

The Future of Computer Vision

The article discusses the rapid advancements in computer vision technology and its applications across various industries.