Embedding Programming Tutorials &amp; Engineering Articles

Uber’s Rate Limiting System

Advanced

Uber’s Rate Limiting System details the evolution of Uber's approach to managing service overload through a unified rate-limiting architecture.

EmbeddingRate LimitingRedisYAML

Chien-Chih Liao, Rahul Gutal, Smit Sheth, Ying Jiang

14 min read

Includes Code

Has Summary

Build with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints

Advanced

Kimi K2. 5 is an advanced multimodal vision language model (VLM) developed by Kimi, optimized for various AI tasks.

EmbeddingFine-tuningHugging FacePyTorch

Anu Srivastava

4 min read

Includes Code

Has Summary

How to Build a Document Processing Pipeline for RAG with Nemotron

Advanced

The article provides a comprehensive guide on building a document processing pipeline using NVIDIA Nemotron RAG, focusing on the extraction of structured data from complex documents like PDFs.

DockerEmbeddingHugging FaceJSONPythonRedistorchvision

Chia-Chih Chen

9 min read

Includes Code

Has Summary

OpenAI

Intermediate

PVH reimagines the future of fashion with OpenAI

PVH Corp. , the parent company of Calvin Klein and Tommy Hilfiger, announced its adoption of ChatGPT Enterprise to transform its global fashion operations.

OpenAI

3 min read

Has Summary

Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs

Advanced

The article discusses the collaboration between NVIDIA and Black Forest Labs to optimize the FLUX. 2 text-to-image model for NVIDIA Blackwell Data Center GPUs.

CachingEmbeddingMistral

Sandro Cavallari

8 min read

Includes Code

Has Summary

Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

Intermediate

The article discusses the NVIDIA Multi-Agent Intelligent Warehouse (MAIW), an AI command layer designed to enhance operational efficiency and supply chain intelligence in automated warehouses.

DockerEmbeddingFastAPIGrafanaHelmJSONJWTOptunaPostgreSQLPrometheusReactRedisSQLTimescaleDB

Tarik Hammadou

10 min read

Includes Code

Has Summary

How to Build a Voice Agent with RAG and Safety Guardrails

Advanced

This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.

EmbeddingHugging FacePythonTransformerTransformers

Chris Alexiuk

8 min read

Includes Code

Has Summary

Powering Billion-Scale Vector Search with OpenSearch

Advanced

The article discusses Uber's transition from traditional keyword-based search using Apache Lucene to implementing semantic vector search with Amazon OpenSearch.

ApacheApache SparkCSSEmbedding

Hao Sun, Jiasen Xu, Smit Patel, Anand Kotriwal, Xu Zhang

11 min read

Has Summary

Evolution and Scale of Uber’s Delivery Search Platform

Advanced

The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experien...

ApacheEmbeddingHugging FacePyTorchTransformers

Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen

11 min read

Has Summary

Building Scalable AI on Enterprise Data with NVIDIA Nemotron RAG and Microsoft SQL Server 2025

Intermediate

The article discusses the integration of NVIDIA Nemotron RAG with Microsoft SQL Server 2025, showcasing how this collaboration enables the development of scalable AI applications on enterprise data.

AzureDockerEmbeddingHTTPSSQLSQL Server

Uttara Kumar

10 min read

Includes Code

Has Summary

Advanced

A Decade of AI Platform at Pinterest

The article reflects on a decade of AI platform development at Pinterest, detailing the evolution from fragmented machine learning stacks to a unified AI platform that supports various models.

AutoMLDockerEmbeddingGenerative AIJavaKubernetesLightGBMPySparkPythonPyTorchSeedSQLTensorFlowThriftTransformer

Pinterest Engineering

22 min read

Has Summary

Enabling Deep Model Explainability with Integrated Gradients at Uber

Advanced

This article discusses how Uber has integrated explainability into its machine learning platform, Michelangelo, using Integrated Gradients (IG) to provide interpretable attributions for deep learni...

EmbeddingKerasLIMEMachine LearningPyTorchSHAPTensorFlowXGBoostYAML

Hugh Chen, Eric Wang, Gaoyuan Huang, Howard Yu, Jia Li, Sally Lee

14 min read

Has Summary

Build a Log Analysis Multi-Agent Self-Corrective RAG System with NVIDIA Nemotron

Advanced

The article discusses the development of an AI-powered log analysis solution using NVIDIA's Generative AI reference workflows.

EmbeddingFine-tuningGenerative AIHugging Face

Prashant Bhende

5 min read

Includes Code

Has Summary

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer

Advanced

The article discusses the optimization of large language models (LLMs) through pruning and knowledge distillation using NVIDIA TensorRT Model Optimizer.

EmbeddingHugging FaceTransformer

Max Xu

10 min read

Includes Code

Has Summary

Gemma explained: EmbeddingGemma Architecture and Recipe

Intermediate

The article provides an in-depth exploration of the EmbeddingGemma architecture, detailing its origins, embedding generation process, and the comprehensive training methodology.

EmbeddingFine-tuningGeminiHugging FaceTransformerTransformersVertex AI

Henrique Schechter Vera, Juyeong Ji, Sahil Dua

7 min read

Includes Code

Has Summary

Cloudflare

Advanced

Choice: the path to AI sovereignty

The article discusses the concept of AI sovereignty, emphasizing the importance of choice for nations in controlling AI technologies and data.

EmbeddingGenerative AIServerless

Carly Ramsey

9 min read

Includes Code

Has Summary

Build a Retrieval-Augmented Generation (RAG) Agent with NVIDIA Nemotron

Advanced

The article provides a comprehensive guide on building a Retrieval-Augmented Generation (RAG) agent using NVIDIA Nemotron, emphasizing the integration of external information to enhance text genera...

DockerEmbeddingHugging FaceLangChainPythonStreamlitVector Database

Edward Li

16 min read

Includes Code

Has Summary

NVIDIA RAPIDS 25.08 Adds New Profiler for cuML, Updates to the Polars GPU Engine, Additional Algorithm Support,

Advanced

The NVIDIA RAPIDS 25.

EmbeddingPolarsPythonscikit-learn

Brian Tepera

8 min read

Includes Code

Has Summary

Gemini Batch API now supports Embeddings and OpenAI Compatibility

Beginner

The article discusses the recent enhancements to the Gemini Batch API, which now includes support for the Gemini Embedding model and compatibility with the OpenAI SDK.

EmbeddingGemini

Lucia Loher, Patrick Löber

2 min read

Includes Code

Has Summary

From Fine-Tuning to Production: A Scalable Embedding Pipeline with Dataflow

Intermediate

This article discusses the integration of Google's EmbeddingGemma model with Google Cloud's Dataflow to create a scalable embedding pipeline for AI applications.

ApacheEmbeddingGeminiGoogle CloudHugging FaceLarge Language ModelsRetrieval Augmented Generation

Danny McCormick, Ian Ballantyne, Olivier Lacombe

5 min read

Includes Code

Has Summary

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

Intermediate

EmbeddingGemma is an innovative open embedding model designed for on-device AI applications, featuring 308 million parameters for efficient performance.

EmbeddingGeminiHugging FaceLangChainOllamaRetrieval Augmented GenerationTransformersVertex AI

Min Choi, Sahil Dua, Alice Lisak

5 min read

Has Summary

Netflix

Advanced

From Facts & Metrics to Media Machine Learning: Evolving the Data Engineering Function at Netflix

The article discusses the evolution of data engineering at Netflix, focusing on the introduction of Media ML Data Engineering, which aims to enhance the handling of complex media data for machine l...

EmbeddingMachine Learning

Netflix Technology Blog

7 min read

Has Summary

Advanced

The Edge of Innovation: Engineering Insights from an Evolving Edge-Building System at LinkedIn

The article discusses the evolution of LinkedIn's edge-building system, focusing on how it leverages AI-powered recommendations to enhance user interactions.

EmbeddingKong

Yi-Wen Liu

13 min read

Has Summary

Ramp

Intermediate

Forward Deployed Engineering

This article explores the rise of Forward Deployed Engineering (FDE) as a strategic role in B2B tech companies, tracing its origins from Palantir to its current adoption across companies like OpenA...

ClaudeElevenLabsEmbeddingLangChainMVP

Leo Mehr

13 min read

Has Summary

OpenAI

Advanced

Introducing gpt-oss

The article introduces gpt-oss, two state-of-the-art open-weight language models, gpt-oss-120b and gpt-oss-20b, which excel in reasoning tasks and are optimized for deployment on consumer hardware.

ApacheAWSAzureEmbeddingGPTHugging FaceOllamaPyTorchRustTransformerVercelWhisper

OpenAI

15 min read

Has Summary

Securing Agentic AI: How Semantic Prompt Injections Bypass AI Guardrails

Advanced

The article discusses the emerging threat of semantic prompt injections in multimodal AI systems, highlighting how adversaries can exploit visual inputs to bypass traditional security measures.

Deep LearningEmbeddingGeminiMachine Learning

Daniel Teixeira

7 min read

Has Summary

Gemini Embedding: Powering RAG and context engineering

Intermediate

The article discusses the Gemini Embedding text model and its applications in various industries, highlighting its effectiveness in enhancing AI applications through context engineering and retriev...

EmbeddingGemini

Vishal Dharmadhikari, Janie Zhang

4 min read

Includes Code

Has Summary

Serverless Distributed Data Processing with Apache Spark and NVIDIA AI on Azure

Advanced

The article discusses the deployment of a serverless, distributed data processing architecture using Apache Spark and NVIDIA AI on Azure.

ApacheApache SparkAzureDockerEmbeddingHTTPSHugging FacePythonREST APIServerlessSQLSQL Server

Alexander Spiridonov

9 min read

Includes Code

Has Summary

Gemini Embedding now generally available in the Gemini API

Intermediate

The article announces the general availability of the Gemini Embedding text model, gemini-embedding-001, in the Gemini API and Vertex AI.

EmbeddingGeminiVertex AI

Min Choi, Janie Zhang

3 min read

Includes Code

Has Summary

Best-in-Class Multimodal RAG: How the Llama 3.2 NeMo Retriever Embedding Model Boosts Pipeline

Advanced

The article discusses the advancements in multimodal retrieval-augmented generation (RAG) systems, particularly focusing on the Llama 3. 2 NeMo Retriever Multimodal Embedding model.

EmbeddingOpenAI API

Benedikt Schifferer

7 min read

Includes Code

Has Summary

Boost Embedding Model Accuracy for Custom Information Retrieval

Advanced

The article discusses the importance of customizing embedding models for effective information retrieval, particularly in domain-specific contexts.

Nirmal Kumar Juluru

7 min read

Has Summary

Finding the Best Chunking Strategy for Accurate AI Responses

Advanced

This article discusses the importance of chunking strategies in AI retrieval systems, particularly in retrieval-augmented generation (RAG) systems.

Unlocking Efficient Ad Retrieval: Offline Approximate Nearest Neighbors in Pinterest Ads

Steve Han

13 min read

Has Summary

Advanced

The article discusses the implementation of Offline Approximate Nearest Neighbors (ANN) at Pinterest to improve ad retrieval efficiency.

Pinterest Engineering

7 min read

Has Summary

JUDE: LLM-based representation learning for LinkedIn job recommendations

Intermediate

Gemini API I/O updates

The article discusses the latest updates to the Gemini API, highlighting new models and functionalities that enhance developers' ability to create applications using generative AI.

EmbeddingGeminiJSON

Shrestha Basu Mallick, Logan Kilpatrick, Alisa Fortin, Ivan Solovyev

7 min read

Includes Code

Has Summary

Advanced

The article discusses JUDE, LinkedIn's platform for generating high-quality embeddings for job recommendations using fine-tuned Large Language Models (LLMs).

BERTEmbeddingHugging FaceKubernetesLarge Language ModelsMistralPyTorchTransfer LearningTransformerTransformers

Nikita Zhiltsov

13 min read

Has Summary

Accelerating Embedding Lookups with cuEmbed

Advanced

NVIDIA's cuEmbed is a high-performance, header-only CUDA library designed to accelerate embedding lookups on NVIDIA GPUs, particularly beneficial for recommendation systems.

EmbeddingPythonPyTorch

Michael Anderson

7 min read

Includes Code

Has Summary

Build and train a recommender system in 10 minutes using Keras and JAX

Advanced

The article introduces Keras Recommenders, a new library designed to simplify the creation of state-of-the-art recommendation systems using Keras with JAX, TensorFlow, or PyTorch.

EmbeddingGRUJAXKerasPyTorchTensorFlow

Yufeng Guo, Monica Song

3 min read

Includes Code

Has Summary

Gemma explained: What’s new in Gemma 3

Intermediate

The article discusses the new features and improvements in Gemma 3, highlighting its vision-language capabilities, architectural changes for memory efficiency, and enhanced multilingual support.

BERTEmbeddingGeminiTransformers

Ju-yeong Ji, Ravin Kumar

9 min read

Includes Code

Has Summary

Spotlight: Qodo Innovates Efficient Code Search with NVIDIA DGX

Advanced

The article discusses how Qodo leverages NVIDIA DGX to innovate efficient code search through AI-powered agents.

EmbeddingGitLabHugging Face

Amit Bleiweiss

7 min read

Has Summary

Developing an AI-Powered Tool for Automatic Citation Validation Using NVIDIA NIM

Intermediate

The article discusses the development of an AI-powered tool for automatic citation validation using NVIDIA NIM, aimed at improving the accuracy of citations in academic and AI-generated content.

EmbeddingGenerative AIGPTLangChainStreamlit

Sebastian Haan

8 min read

Has Summary

Cloudflare

Intermediate

Introducing AutoRAG: fully managed Retrieval-Augmented Generation on Cloudflare

The article introduces AutoRAG, a fully managed Retrieval-Augmented Generation (RAG) pipeline available in open beta on Cloudflare.

EmbeddingFine-tuningHTMLJSONREST APITypeScript

Anni Wang

11 min read

Includes Code

Has Summary

Evaluating and Enhancing RAG Pipeline Performance Using Synthetic Data

Advanced

The article discusses the evaluation and enhancement of Retrieval-Augmented Generation (RAG) pipeline performance using synthetic data.

EmbeddingGenerative AISeed

Vinay Raman

11 min read

Includes Code

Has Summary

Enhancing Personalized CRM Communication with Contextual Bandit Strategies

Intermediate

This article discusses how Uber enhances personalized CRM communication using contextual bandit strategies, particularly focusing on the application of AI/ML techniques to optimize email content.

EmbeddingGenerative AIGPTMachine LearningXGBoost

LJ (Lin) He, Yifeng Wu, Gaurav Jindal

13 min read

Has Summary

Cloudflare

Advanced

An early look at cryptographic watermarks for AI-generated content

The article explores the emerging field of cryptographic watermarking for AI-generated content, discussing its importance in identifying the origins of digital artifacts.

EmbeddingGenerative AIPILStable Diffusion

Teresa Brooks-Mejia

24 min read

Includes Code

Has Summary

NVIDIA NeMo Retriever Delivers Accurate Multimodal PDF Data Extraction 15x Faster

Advanced

The article discusses the advancements in NVIDIA's NeMo Retriever, which enables accurate multimodal PDF data extraction at a speed 15 times faster than traditional methods.

AWSAWS SageMakerAzureEmbeddingGoogle Cloud

Ruchika Kharwar

10 min read

Has Summary

Airbnb

Advanced

Embedding-Based Retrieval for Airbnb Search

The article discusses the development of Airbnb's first Embedding-Based Retrieval (EBR) search system, which aims to improve the relevance of search results for users by narrowing down the pool of ...

Huiji Gao

7 min read

Has Summary

Understanding PTX, the Assembly Language of CUDA GPU Computing

Intermediate

This article provides an in-depth understanding of Parallel Thread Execution (PTX), the assembly language for NVIDIA's CUDA GPU computing platform.

AssemblyEmbedding

Tony Scudiero

13 min read

Includes Code

Has Summary

State-of-the-art text embedding via the Gemini API

Beginner

The article discusses the introduction of the Gemini Embedding text model (gemini-embedding-exp-03-07) available through the Gemini API.

EmbeddingGeminiVertex AI

Logan Kilpatrick, Zach Gleicher, Parashar Shah

3 min read

Includes Code

Has Summary