Transformers Programming Tutorials & Engineering Articles

162 Transformers tutorials, guides, and engineering insights from NVIDIA, OpenAI, Google, and more

Companies Using This

NVIDIA(88)

Transformers Articles & Tutorials

Filter:

NVIDIA

Advanced

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

The article discusses NVIDIA TensorRT LLM AutoDeploy, a beta feature that automates the inference optimization process for large language models (LLMs).

Hugging FacePyTorchTransformersV

Lucas Liebenwein

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

The article discusses the limitations of current large language models (LLMs) in handling long contexts and introduces Test-Time Training with an end-to-end formulation (TTT-E2E) as a solution.

Neural NetworksRecurrent Neural NetworksTransformerTransformers

Yu Sun

6 min read

Has Summary

NVIDIA

Intermediate

Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM

The article discusses the introduction of NVIDIA TensorRT Edge-LLM, an open-source C++ framework designed for high-performance inference of Large Language Models (LLMs) and Vision Language Models (...

ChiHugging FacePythonTransformers

Lin Chai

5 min read

Includes Code

Has Summary

NVIDIA

Advanced

How to Build a Voice Agent with RAG and Safety Guardrails

This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.

EmbeddingHugging FacePythonTransformerTransformers

Chris Alexiuk

8 min read

Includes Code

Has Summary

Uber

Advanced

Evolution and Scale of Uber’s Delivery Search Platform

The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experien...

ApacheEmbeddingHugging FacePyTorchTransformers

Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen

11 min read

Has Summary

OpenAI

Intermediate

Understanding neural networks through sparse circuits

The article discusses the challenges of understanding neural networks and presents a novel approach to improve interpretability through sparse circuits.

GPTTransformers

OpenAI Team

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes

The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...

Hugging FacePyTorchTransformerTransformers

Kyle Tretina

6 min read

Includes Code

Has Summary

Google

Intermediate

Own your AI: Learn how to fine-tune Gemma 3 270M and run it on-device

The article discusses how to fine-tune the Gemma 3 270M model for on-device applications, enabling developers to create custom AI models without the need for expensive hardware.

Fine-tuningGeminiHugging FaceJavaScriptTransformers

Ian Ballantyne, Jason Mayes

5 min read

Includes Code

Has Summary

Google

Intermediate

Gemma explained: EmbeddingGemma Architecture and Recipe

The article provides an in-depth exploration of the EmbeddingGemma architecture, detailing its origins, embedding generation process, and the comprehensive training methodology.

EmbeddingFine-tuningGeminiHugging FaceTransformerTransformersVertex AI

Henrique Schechter Vera, Juyeong Ji, Sahil Dua

7 min read

Includes Code

Has Summary

NVIDIA

Advanced

Reducing Cold Start Latency for LLM Inference with NVIDIA Run:ai Model Streamer

The article discusses the challenges of cold start latency in deploying large language models (LLMs) and introduces the NVIDIA Run:ai Model Streamer, an open-source Python SDK designed to optimize ...

AWSAWS S3HTTPSHugging FacePythonPyTorchTransformers

Omer Dayan

12 min read

Has Summary

Google

Intermediate

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

EmbeddingGemma is an innovative open embedding model designed for on-device AI applications, featuring 308 million parameters for efficient performance.

EmbeddingGeminiHugging FaceLangChainOllamaRetrieval Augmented GenerationTransformersVertex AI

Min Choi, Sahil Dua, Alice Lisak

5 min read

Has Summary

NVIDIA

Advanced

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

The article discusses fine-tuning the gpt-oss model for improved accuracy and performance through Quantization Aware Training (QAT) and Supervised Fine-Tuning (SFT).

GPTHugging FacePyTorchTransformerTransformers

Eduardo Alvarez

7 min read

Includes Code

Has Summary

Uber

Advanced

Forecasting Models to Improve Driver Availability at Airports

This article discusses the development and implementation of forecasting models aimed at improving driver availability at airports, which are critical to Uber's ridesharing ecosystem.

ApacheApache SparkCassandraKongTransformerTransformers

Bob Zheng, Dhruv Ghulati, Manoj Panikkar, Michael (Yichuan) Cai

15 min read

Has Summary

Google

Intermediate

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

The article introduces Gemma 3 270M, a compact AI model designed for hyper-efficient task-specific fine-tuning.

DockerGoogle CloudHugging FaceJAXKerasOllamaTransformersVertex AI

Olivier Lacombe, Kathleen Kenealy, Kat Black, Ravin Kumar, Francesco Visin, Jiageng Zhang

5 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Accelerates OpenAI gpt-oss Models Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72

NVIDIA has optimized OpenAI's gpt-oss models for accelerated inference performance on the NVIDIA GB200 NVL72 system, achieving up to 1. 5 million tokens per second (TPS).

DockerHugging FaceOllamaPythonTransformerTransformers

Anu Srivastava

6 min read

Includes Code

Has Summary

Cloudflare

Advanced

Building Jetflow: a framework for flexible, performant data pipelines at Cloudflare

The article discusses the development of Jetflow, a framework designed by Cloudflare's Business Intelligence team to manage complex data ingestion tasks efficiently.

GolangTransformersYAML

Harry Hough

11 min read

Has Summary

Google

Intermediate

Introducing Gemma 3n: The developer guide

The article introduces Gemma 3n, a mobile-first architecture designed for on-device AI, highlighting its multimodal capabilities and architectural innovations.

DockerGeminiGPTHugging FaceOllamaTransformerTransformersVertex AI

Omar Sanseviero, Ian Ballantyne

9 min read

Includes Code

Has Summary

NVIDIA

Intermediate

How Early Access to NVIDIA GB200 Systems Helped LMArena Build a Model to Evaluate LLMs

LMArena, in collaboration with NVIDIA and Nebius, has developed the Prompt-to-Leaderboard (P2L) model to evaluate the performance of large language models (LLMs) across various tasks.

Hugging FacePyTorchtorchvisionTransformers

Jason Perlow

6 min read

Has Summary

NVIDIA

Advanced

Accelerated Sequence Alignment for Protein Science with MMseqs2-GPU and NVIDIA NIM

The article discusses the advancements in protein sequence alignment using MMseqs2-GPU and NVIDIA NIM, highlighting their significance in accelerating drug discovery and structural prediction in pr...

Transformers

Kyle Tretina

8 min read

Includes Code

Has Summary

NVIDIA

Advanced

Accelerated Molecular Modeling with NVIDIA cuEquivariance and NVIDIA NIM microservices

The article discusses NVIDIA's advancements in molecular AI modeling through the introduction of cuEquivariance and NIM microservices, which enhance the speed and efficiency of training and inferen...

ApachePyTorchTransformerTransformers

Neha Tadimeti

8 min read

Has Summary

NVIDIA

Advanced

Scaling to Millions of Tokens with Efficient Long-Context LLM Training

The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.

TransformerTransformersV

Amit Bleiweiss

7 min read

Has Summary

Advanced

JUDE: LLM-based representation learning for LinkedIn job recommendations

The article discusses JUDE, LinkedIn's platform for generating high-quality embeddings for job recommendations using fine-tuned Large Language Models (LLMs).

BERTEmbeddingHugging FaceKubernetesLarge Language ModelsMistralPyTorchTransfer LearningTransformerTransformers

Nikita Zhiltsov

13 min read

Has Summary

NVIDIA

Advanced

Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

The article discusses how to accelerate Deep Learning (DL) and Large Language Model (LLM) inference using Apache Spark in cloud environments.

ApacheApache SparkAWSAzureDeep LearningDockerJSONNumPyPythonPyTorchSemantic SearchTensorFlowTransformers

Rishi Chandra

9 min read

Includes Code

Has Summary

Google

Intermediate

Gemma explained: What’s new in Gemma 3

The article discusses the new features and improvements in Gemma 3, highlighting its vision-language capabilities, architectural changes for memory efficiency, and enhanced multilingual support.

BERTEmbeddingGeminiTransformers

Ju-yeong Ji, Ravin Kumar

9 min read

Includes Code

Has Summary

OpenAI

Advanced

BrowseComp: a benchmark for browsing agents

BrowseComp is a newly introduced benchmark designed to evaluate the capabilities of AI agents in locating hard-to-find information on the internet.

ClaudeGeminiGPTTransformers

Jason Wei

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

LLM Inference Benchmarking: Fundamental Concepts

This article introduces the fundamental concepts of large language model (LLM) inference benchmarking, focusing on key metrics such as throughput and latency.

Generative AITransformers

Vinh Nguyen

14 min read

Has Summary

Google

Intermediate

Introducing Gemma 3: The Developer Guide

Gemma 3 is the latest version of the Gemma open-model family, boasting enhanced capabilities such as multimodality, longer context windows, and improved reasoning.

Hugging FaceJAXOllamaReinforcement LearningRLHFTransformersVertex AI

Omar Sanseviero, Philipp Schmid

5 min read

Includes Code

Has Summary

Google

Beginner

Safer and Multimodal: Responsible AI with Gemma

The article discusses the launch of ShieldGemma 2, a safety content classifier model built on Gemma 3, aimed at detecting harmful content in both synthetic and natural images.

Hugging FaceJAXKerasOllamaTransformers

Dana Kurniawan, Wenjun Zeng, Ryan Mullins

3 min read

Has Summary

NVIDIA

Advanced

Understanding the Language of Life’s Biomolecules Across Evolution at a New Scale with Evo 2

The article discusses the advancements in AI-driven biological research with the introduction of Evo 2, a foundation model that integrates genomic, RNA, and protein data across multiple life domain...

AWSFine-tuningJSONTransformerTransformersYAML

Kyle Tretina

9 min read

Includes Code

Has Summary

Google

Beginner

Introducing PaliGemma 2 mix: A vision-language model for multiple tasks

PaliGemma 2 mix is an advanced vision-language model designed for multiple tasks, allowing developers to utilize a single model for various applications such as image captioning, object detection, ...

Hugging FaceJAXKerasPyTorchTransformers

Omar Sanseviero, Andreas Steiner

3 min read

Includes Code

Has Summary

Intermediate

Advancements in Embedding-Based Retrieval at Pinterest Homefeed

The article discusses advancements in embedding-based retrieval at Pinterest's Homefeed, focusing on improvements such as feature crossing, ID embeddings, and serving corpus upgrades.

Capsule NetworksEmbeddingMachine LearningTransformers

Pinterest Engineering

8 min read

Has Summary

NVIDIA

Intermediate

Dynamic Memory Compression

The article discusses Dynamic Memory Compression (DMC), a technology developed by NVIDIA to enhance the efficiency of large language models (LLMs) by adaptively compressing the conversation state.

Natural Language ProcessingTransformerTransformers

Edoardo Maria Ponti

8 min read

Has Summary

NVIDIA

Beginner

NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules

NVIDIA JetPack 6. 2 introduces Super Mode for the Jetson Orin Nano and Jetson Orin NX modules, significantly enhancing generative AI performance.

CLIPHugging FaceOllamaTransformers

Shashank Maheshwari

11 min read

Includes Code

Has Summary

NVIDIA

Intermediate

NVIDIA Jetson Orin Nano Developer Kit Gets a “Super” Boost

The article discusses the enhancements made to the NVIDIA Jetson Orin Nano Developer Kit, now renamed the Jetson Orin Nano Super Developer Kit, which offers a performance boost of up to 1.

Generative AIHugging FaceOllamaTransformerTransformers

Suhas Hariharapura Sheshadri

10 min read

Includes Code

Has Summary

Google

Intermediate

Introducing PaliGemma 2: Powerful Vision-Language Models, Simple Fine-Tuning

PaliGemma 2 is the latest vision-language model from Google, designed to simplify the process of building advanced AI that can interpret visual inputs.

Hugging FaceJAXKerasPyTorchTransformers

Daniel Keysers, Andreas Steiner

3 min read

Has Summary

NVIDIA

Intermediate

Hymba Hybrid-Head Architecture Boosts Small Language Model Performance

The article discusses NVIDIA's Hymba hybrid-head architecture, which combines transformer attention mechanisms with state space models to enhance the performance and efficiency of small language mo...

EmbeddingHugging FacePyTorchTransformerTransformers

Xin Dong

11 min read

Has Summary

NVIDIA

Advanced

NVIDIA Partners Accelerate Quantum Breakthroughs with AI Supercomputing

NVIDIA is advancing quantum computing through partnerships that integrate AI supercomputing with quantum hardware, aiming to overcome current technological challenges.

Artificial IntelligenceGenerative AIGPTSolidTransformers

Marwa Farag

7 min read

Has Summary

Airbnb

Advanced

Airbnb’s AI-powered photo tour using Vision Transformer

The article discusses Airbnb's implementation of an AI-powered photo tour feature using Vision Transformers to enhance the guest experience by accurately classifying and organizing listing images.

Fine-tuningMachine LearningTransformerTransformers

Pei Xiong

9 min read

Has Summary

Google

Advanced

Web AI Summit 2024 Recap: Client-Side AI for Developers

The Web AI Summit 2024, hosted by Google on October 18, 2024, focused on client-side AI for developers, showcasing how machine learning models can operate offline in web browsers.

Hugging FaceJavaScriptJSONLangChainMachine LearningTensorFlowTransformersWebAssembly

Jason Mayes

10 min read

Has Summary

Google

Intermediate

Evolving the Responsible Generative AI Toolkit with new tools for every LLM

The article discusses the expansion of the Responsible Generative AI Toolkit, introducing new tools designed for various large language models (LLMs) like Gemma and Gemini.

GeminiGenerative AIGoogle CloudHugging FaceKerasTransformersVertex AI

Ryan Mullins

3 min read

Has Summary

Uber

Advanced

Open Source and In-House: How Uber Optimizes LLM Training

The article discusses how Uber optimizes the training of Large Language Models (LLMs) using both open-source and in-house models.

ApacheApache KafkaApache SparkCometDockerGoogle CloudGPTGPT-4Hugging FaceKubernetesMistralPyTorchSQLTransformers

Bo Ling, Jiapei Huang, Baojun Liu, Chongxiao Cao, Anant Vyas, Peng Zhang

11 min read

Has Summary

NVIDIA

Advanced

Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo

The article discusses how NVIDIA NeMo has accelerated automatic speech recognition (ASR) models, achieving up to 10x speed improvements through various optimizations.

AWSHugging FacePythonPyTorchTransformersWhisper

Daniel Galvez

12 min read

Includes Code

Has Summary

NVIDIA

Intermediate

New Foundational Models and Training Capabilities with NVIDIA TAO 5.5

The article discusses the release of NVIDIA TAO 5. 5, a framework that simplifies AI model development and deployment.

AutoMLBERTCLIPModalPyTorchResNetTensorFlowTransformerTransformers

Monika Jhuria

12 min read

Includes Code

Has Summary

Google

Advanced

Gemma explained: RecurrentGemma architecture

The article explores the RecurrentGemma architecture, a hybrid model that combines gated linear recurrences with local sliding window attention, enhancing performance for long context prompts.

EmbeddingNeural NetworksTransformerTransformers

Ju-yeong Ji, Ravin Kumar

6 min read

Includes Code

Has Summary

Google

Intermediate

Gemma explained: An overview of Gemma model family architectures

The article provides an overview of the Gemma model family architectures, detailing its lightweight, state-of-the-art open models derived from Gemini research.

BERTEmbeddingGeminiGPTHugging FaceKerasT5TransformerTransformers

Ju-yeong Ji, Ravin Kumar

9 min read

Includes Code

Has Summary

NVIDIA

Advanced

Deploy Multilingual LLMs with NVIDIA NIM

The article discusses the deployment of multilingual large language models (LLMs) using NVIDIA NIM, highlighting the importance of effective communication across languages in a globalized business ...

DockerGenerative AIGitHugging FaceLangChainTransformers

Amit Bleiweiss

9 min read

Includes Code

Has Summary

Google

Advanced

Get started with Gemma on Ray on Vertex AI

This article provides a comprehensive guide on using Gemma with Ray on Vertex AI, detailing the steps to set up, fine-tune, and deploy machine learning models.

DockerGeminiGoogle CloudHugging FaceJSONPandasPyTorchTensorBoardTransformerTransformersVertex AI

Ju-yeong Ji, Ivan Nardini

12 min read

Includes Code

Has Summary

Google

Intermediate

Fine-tuning Gemma 2 with Keras - and an update from Hugging Face

The article discusses the release of the Gemma 2 model with 27 billion parameters, highlighting its capabilities in Keras and integration with JAX for efficient model training.

Fine-tuningHugging FaceJAXKerasPyTorchTensorFlowTransformerTransformers

Martin Görner

5 min read

Includes Code

Has Summary

NVIDIA

Intermediate

Level Up Your Skills with Five New NVIDIA Technical Courses

The article introduces five new technical courses offered by NVIDIA aimed at enhancing skills in AI and data science.

ApacheApache ArrowApache SparkComputer VisionNatural Language ProcessingPrompt EngineeringPyTorchTransformerTransformersXGBoost

Rachel Ho

4 min read

Has Summary

NVIDIA

Intermediate

NVIDIA Sets New Generative AI Performance and Scale Records in MLPerf Training v4.0

NVIDIA has achieved new generative AI performance records in MLPerf Training v4. 0, showcasing significant advancements in training large language models (LLMs) and graph neural networks (GNNs).

BERTGenerative AIGPTResNetRLHFStable DiffusionTransformerTransformersU-Net

Ashraf Eassa

10 min read

Has Summary