#
Transformers Programming Tutorials & Engineering Articles
162 Transformers tutorials, guides, and engineering insights from NVIDIA, OpenAI, Google, and more
Companies Using This
Transformers Articles & Tutorials
Filter:
The article discusses NVIDIA TensorRT LLM AutoDeploy, a beta feature that automates the inference optimization process for large language models (LLMs).
Lucas Liebenwein
8 min read
Includes Code
Has Summary
--
The article discusses the limitations of current large language models (LLMs) in handling long contexts and introduces Test-Time Training with an end-to-end formulation (TTT-E2E) as a solution.
Yu Sun
6 min read
Has Summary
--
The article discusses the introduction of NVIDIA TensorRT Edge-LLM, an open-source C++ framework designed for high-performance inference of Large Language Models (LLMs) and Vision Language Models (...
Lin Chai
5 min read
Includes Code
Has Summary
--
This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.
Chris Alexiuk
8 min read
Includes Code
Has Summary
--
The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experien...
Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen
11 min read
Has Summary
--
The article discusses the challenges of understanding neural networks and presents a novel approach to improve interpretability through sparse circuits.
OpenAI Team
7 min read
Includes Code
Has Summary
--
The article discusses how to scale biology transformer models using PyTorch and NVIDIA BioNeMo Recipes, focusing on advanced parallel computing techniques and the integration of the NVIDIA Transfor...
Kyle Tretina
6 min read
Includes Code
Has Summary
--
The article discusses how to fine-tune the Gemma 3 270M model for on-device applications, enabling developers to create custom AI models without the need for expensive hardware.
Ian Ballantyne, Jason Mayes
5 min read
Includes Code
Has Summary
--
The article provides an in-depth exploration of the EmbeddingGemma architecture, detailing its origins, embedding generation process, and the comprehensive training methodology.
Henrique Schechter Vera, Juyeong Ji, Sahil Dua
7 min read
Includes Code
Has Summary
--
The article discusses the challenges of cold start latency in deploying large language models (LLMs) and introduces the NVIDIA Run:ai Model Streamer, an open-source Python SDK designed to optimize ...
Omer Dayan
12 min read
Has Summary
--
EmbeddingGemma is an innovative open embedding model designed for on-device AI applications, featuring 308 million parameters for efficient performance.
Min Choi, Sahil Dua, Alice Lisak
5 min read
Has Summary
--
The article discusses fine-tuning the gpt-oss model for improved accuracy and performance through Quantization Aware Training (QAT) and Supervised Fine-Tuning (SFT).
Eduardo Alvarez
7 min read
Includes Code
Has Summary
--
This article discusses the development and implementation of forecasting models aimed at improving driver availability at airports, which are critical to Uber's ridesharing ecosystem.
Bob Zheng, Dhruv Ghulati, Manoj Panikkar, Michael (Yichuan) Cai
15 min read
Has Summary
--
The article introduces Gemma 3 270M, a compact AI model designed for hyper-efficient task-specific fine-tuning.
Olivier Lacombe, Kathleen Kenealy, Kat Black, Ravin Kumar, Francesco Visin, Jiageng Zhang
5 min read
Has Summary
--
NVIDIA has optimized OpenAI's gpt-oss models for accelerated inference performance on the NVIDIA GB200 NVL72 system, achieving up to 1. 5 million tokens per second (TPS).
Anu Srivastava
6 min read
Includes Code
Has Summary
--
The article discusses the development of Jetflow, a framework designed by Cloudflare's Business Intelligence team to manage complex data ingestion tasks efficiently.
Harry Hough
11 min read
Has Summary
--
The article introduces Gemma 3n, a mobile-first architecture designed for on-device AI, highlighting its multimodal capabilities and architectural innovations.
Omar Sanseviero, Ian Ballantyne
9 min read
Includes Code
Has Summary
--
LMArena, in collaboration with NVIDIA and Nebius, has developed the Prompt-to-Leaderboard (P2L) model to evaluate the performance of large language models (LLMs) across various tasks.
Jason Perlow
6 min read
Has Summary
--
The article discusses the advancements in protein sequence alignment using MMseqs2-GPU and NVIDIA NIM, highlighting their significance in accelerating drug discovery and structural prediction in pr...
Kyle Tretina
8 min read
Includes Code
Has Summary
--
The article discusses NVIDIA's advancements in molecular AI modeling through the introduction of cuEquivariance and NIM microservices, which enhance the speed and efficiency of training and inferen...
Neha Tadimeti
8 min read
Has Summary
--
The article discusses the advancements in large language models (LLMs) focusing on the importance of extended context lengths for processing and generating text.
Amit Bleiweiss
7 min read
Has Summary
--
The article discusses JUDE, LinkedIn's platform for generating high-quality embeddings for job recommendations using fine-tuned Large Language Models (LLMs).
BERTEmbeddingHugging FaceKubernetesLarge Language ModelsMistralPyTorchTransfer LearningTransformerTransformers
Nikita Zhiltsov
13 min read
Has Summary
--
The article discusses how to accelerate Deep Learning (DL) and Large Language Model (LLM) inference using Apache Spark in cloud environments.
ApacheApache SparkAWSAzureDeep LearningDockerJSONNumPyPythonPyTorchSemantic SearchTensorFlowTransformers
Rishi Chandra
9 min read
Includes Code
Has Summary
--
The article discusses the new features and improvements in Gemma 3, highlighting its vision-language capabilities, architectural changes for memory efficiency, and enhanced multilingual support.
Ju-yeong Ji, Ravin Kumar
9 min read
Includes Code
Has Summary
--
BrowseComp is a newly introduced benchmark designed to evaluate the capabilities of AI agents in locating hard-to-find information on the internet.
Jason Wei
11 min read
Includes Code
Has Summary
--
This article introduces the fundamental concepts of large language model (LLM) inference benchmarking, focusing on key metrics such as throughput and latency.
Vinh Nguyen
14 min read
Has Summary
--
Gemma 3 is the latest version of the Gemma open-model family, boasting enhanced capabilities such as multimodality, longer context windows, and improved reasoning.
Omar Sanseviero, Philipp Schmid
5 min read
Includes Code
Has Summary
--
The article discusses the launch of ShieldGemma 2, a safety content classifier model built on Gemma 3, aimed at detecting harmful content in both synthetic and natural images.
Dana Kurniawan, Wenjun Zeng, Ryan Mullins
3 min read
Has Summary
--
The article discusses the advancements in AI-driven biological research with the introduction of Evo 2, a foundation model that integrates genomic, RNA, and protein data across multiple life domain...
Kyle Tretina
9 min read
Includes Code
Has Summary
--
PaliGemma 2 mix is an advanced vision-language model designed for multiple tasks, allowing developers to utilize a single model for various applications such as image captioning, object detection, ...
Omar Sanseviero, Andreas Steiner
3 min read
Includes Code
Has Summary
--
The article discusses advancements in embedding-based retrieval at Pinterest's Homefeed, focusing on improvements such as feature crossing, ID embeddings, and serving corpus upgrades.
Pinterest Engineering
8 min read
Has Summary
--
The article discusses Dynamic Memory Compression (DMC), a technology developed by NVIDIA to enhance the efficiency of large language models (LLMs) by adaptively compressing the conversation state.
Edoardo Maria Ponti
8 min read
Has Summary
--
NVIDIA JetPack 6. 2 introduces Super Mode for the Jetson Orin Nano and Jetson Orin NX modules, significantly enhancing generative AI performance.
Shashank Maheshwari
11 min read
Includes Code
Has Summary
--
The article discusses the enhancements made to the NVIDIA Jetson Orin Nano Developer Kit, now renamed the Jetson Orin Nano Super Developer Kit, which offers a performance boost of up to 1.
Suhas Hariharapura Sheshadri
10 min read
Includes Code
Has Summary
--
PaliGemma 2 is the latest vision-language model from Google, designed to simplify the process of building advanced AI that can interpret visual inputs.
Daniel Keysers, Andreas Steiner
3 min read
Has Summary
--
The article discusses NVIDIA's Hymba hybrid-head architecture, which combines transformer attention mechanisms with state space models to enhance the performance and efficiency of small language mo...
Xin Dong
11 min read
Has Summary
--
NVIDIA is advancing quantum computing through partnerships that integrate AI supercomputing with quantum hardware, aiming to overcome current technological challenges.
Marwa Farag
7 min read
Has Summary
--
The article discusses Airbnb's implementation of an AI-powered photo tour feature using Vision Transformers to enhance the guest experience by accurately classifying and organizing listing images.
Pei Xiong
9 min read
Has Summary
--
The Web AI Summit 2024, hosted by Google on October 18, 2024, focused on client-side AI for developers, showcasing how machine learning models can operate offline in web browsers.
Jason Mayes
10 min read
Has Summary
--
The article discusses the expansion of the Responsible Generative AI Toolkit, introducing new tools designed for various large language models (LLMs) like Gemma and Gemini.
Ryan Mullins
3 min read
Has Summary
--
The article discusses how Uber optimizes the training of Large Language Models (LLMs) using both open-source and in-house models.
ApacheApache KafkaApache SparkCometDockerGoogle CloudGPTGPT-4Hugging FaceKubernetesMistralPyTorchSQLTransformers
Bo Ling, Jiapei Huang, Baojun Liu, Chongxiao Cao, Anant Vyas, Peng Zhang
11 min read
Has Summary
--
The article discusses how NVIDIA NeMo has accelerated automatic speech recognition (ASR) models, achieving up to 10x speed improvements through various optimizations.
Daniel Galvez
12 min read
Includes Code
Has Summary
--
The article discusses the release of NVIDIA TAO 5. 5, a framework that simplifies AI model development and deployment.
Monika Jhuria
12 min read
Includes Code
Has Summary
--
The article explores the RecurrentGemma architecture, a hybrid model that combines gated linear recurrences with local sliding window attention, enhancing performance for long context prompts.
Ju-yeong Ji, Ravin Kumar
6 min read
Includes Code
Has Summary
--
The article provides an overview of the Gemma model family architectures, detailing its lightweight, state-of-the-art open models derived from Gemini research.
Ju-yeong Ji, Ravin Kumar
9 min read
Includes Code
Has Summary
--
The article discusses the deployment of multilingual large language models (LLMs) using NVIDIA NIM, highlighting the importance of effective communication across languages in a globalized business ...
Amit Bleiweiss
9 min read
Includes Code
Has Summary
--
This article provides a comprehensive guide on using Gemma with Ray on Vertex AI, detailing the steps to set up, fine-tune, and deploy machine learning models.
Ju-yeong Ji, Ivan Nardini
12 min read
Includes Code
Has Summary
--
The article discusses the release of the Gemma 2 model with 27 billion parameters, highlighting its capabilities in Keras and integration with JAX for efficient model training.
Martin Görner
5 min read
Includes Code
Has Summary
--
The article introduces five new technical courses offered by NVIDIA aimed at enhancing skills in AI and data science.
ApacheApache ArrowApache SparkComputer VisionNatural Language ProcessingPrompt EngineeringPyTorchTransformerTransformersXGBoost
Rachel Ho
4 min read
Has Summary
--
NVIDIA has achieved new generative AI performance records in MLPerf Training v4. 0, showcasing significant advancements in training large language models (LLMs) and graph neural networks (GNNs).
Ashraf Eassa
10 min read
Has Summary
--