#
Embedding Programming Tutorials & Engineering Articles
173 Embedding tutorials, guides, and engineering insights from NVIDIA, Pinterest, Google, and more
Companies Using This
Embedding Articles & Tutorials
Filter:
Uber’s Rate Limiting System details the evolution of Uber's approach to managing service overload through a unified rate-limiting architecture.
Chien-Chih Liao, Rahul Gutal, Smit Sheth, Ying Jiang
14 min read
Includes Code
Has Summary
--
Kimi K2. 5 is an advanced multimodal vision language model (VLM) developed by Kimi, optimized for various AI tasks.
Anu Srivastava
4 min read
Includes Code
Has Summary
--
The article provides a comprehensive guide on building a document processing pipeline using NVIDIA Nemotron RAG, focusing on the extraction of structured data from complex documents like PDFs.
Chia-Chih Chen
9 min read
Includes Code
Has Summary
--
PVH Corp. , the parent company of Calvin Klein and Tommy Hilfiger, announced its adoption of ChatGPT Enterprise to transform its global fashion operations.
OpenAI
3 min read
Has Summary
--
The article discusses the collaboration between NVIDIA and Black Forest Labs to optimize the FLUX. 2 text-to-image model for NVIDIA Blackwell Data Center GPUs.
The article discusses the NVIDIA Multi-Agent Intelligent Warehouse (MAIW), an AI command layer designed to enhance operational efficiency and supply chain intelligence in automated warehouses.
Tarik Hammadou
10 min read
Includes Code
Has Summary
--
This article provides a comprehensive tutorial on building a voice agent using NVIDIA's Nemotron models, focusing on retrieval-augmented generation (RAG) and safety guardrails.
Chris Alexiuk
8 min read
Includes Code
Has Summary
--
The article discusses Uber's transition from traditional keyword-based search using Apache Lucene to implementing semantic vector search with Amazon OpenSearch.
Hao Sun, Jiasen Xu, Smit Patel, Anand Kotriwal, Xu Zhang
11 min read
Has Summary
--
The article discusses the evolution and scaling of Uber's Delivery Search Platform, emphasizing the transition from traditional lexical search to a semantic search model that enhances user experien...
Divya Nagar, Zheng Liu, Jiasen Xu, Bo Ling, Haoyang Chen
11 min read
Has Summary
--
The article discusses the integration of NVIDIA Nemotron RAG with Microsoft SQL Server 2025, showcasing how this collaboration enables the development of scalable AI applications on enterprise data.
The article reflects on a decade of AI platform development at Pinterest, detailing the evolution from fragmented machine learning stacks to a unified AI platform that supports various models.
AutoMLDockerEmbeddingGenerative AIJavaKubernetesLightGBMPySparkPythonPyTorchSeedSQLTensorFlowThriftTransformer
Pinterest Engineering
22 min read
Has Summary
--
This article discusses how Uber has integrated explainability into its machine learning platform, Michelangelo, using Integrated Gradients (IG) to provide interpretable attributions for deep learni...
Hugh Chen, Eric Wang, Gaoyuan Huang, Howard Yu, Jia Li, Sally Lee
14 min read
Has Summary
--
The article discusses the development of an AI-powered log analysis solution using NVIDIA's Generative AI reference workflows.
Prashant Bhende
5 min read
Includes Code
Has Summary
--
The article discusses the optimization of large language models (LLMs) through pruning and knowledge distillation using NVIDIA TensorRT Model Optimizer.
Max Xu
10 min read
Includes Code
Has Summary
--
The article provides an in-depth exploration of the EmbeddingGemma architecture, detailing its origins, embedding generation process, and the comprehensive training methodology.
Henrique Schechter Vera, Juyeong Ji, Sahil Dua
7 min read
Includes Code
Has Summary
--
The article discusses the concept of AI sovereignty, emphasizing the importance of choice for nations in controlling AI technologies and data.
Carly Ramsey
9 min read
Includes Code
Has Summary
--
The article provides a comprehensive guide on building a Retrieval-Augmented Generation (RAG) agent using NVIDIA Nemotron, emphasizing the integration of external information to enhance text genera...
Edward Li
16 min read
Includes Code
Has Summary
--
Brian Tepera
8 min read
Includes Code
Has Summary
--
The article discusses the recent enhancements to the Gemini Batch API, which now includes support for the Gemini Embedding model and compatibility with the OpenAI SDK.
This article discusses the integration of Google's EmbeddingGemma model with Google Cloud's Dataflow to create a scalable embedding pipeline for AI applications.
Danny McCormick, Ian Ballantyne, Olivier Lacombe
5 min read
Includes Code
Has Summary
--
EmbeddingGemma is an innovative open embedding model designed for on-device AI applications, featuring 308 million parameters for efficient performance.
Min Choi, Sahil Dua, Alice Lisak
5 min read
Has Summary
--
The article discusses the evolution of data engineering at Netflix, focusing on the introduction of Media ML Data Engineering, which aims to enhance the handling of complex media data for machine l...
Netflix Technology Blog
7 min read
Has Summary
--
The article discusses the evolution of LinkedIn's edge-building system, focusing on how it leverages AI-powered recommendations to enhance user interactions.
This article explores the rise of Forward Deployed Engineering (FDE) as a strategic role in B2B tech companies, tracing its origins from Palantir to its current adoption across companies like OpenA...
Leo Mehr
13 min read
Has Summary
--
The article introduces gpt-oss, two state-of-the-art open-weight language models, gpt-oss-120b and gpt-oss-20b, which excel in reasoning tasks and are optimized for deployment on consumer hardware.
The article discusses the emerging threat of semantic prompt injections in multimodal AI systems, highlighting how adversaries can exploit visual inputs to bypass traditional security measures.
Daniel Teixeira
7 min read
Has Summary
--
The article discusses the Gemini Embedding text model and its applications in various industries, highlighting its effectiveness in enhancing AI applications through context engineering and retriev...
The article discusses the deployment of a serverless, distributed data processing architecture using Apache Spark and NVIDIA AI on Azure.
Alexander Spiridonov
9 min read
Includes Code
Has Summary
--
The article announces the general availability of the Gemini Embedding text model, gemini-embedding-001, in the Gemini API and Vertex AI.
The article discusses the advancements in multimodal retrieval-augmented generation (RAG) systems, particularly focusing on the Llama 3. 2 NeMo Retriever Multimodal Embedding model.
Benedikt Schifferer
7 min read
Includes Code
Has Summary
--
The article discusses the importance of customizing embedding models for effective information retrieval, particularly in domain-specific contexts.
Nirmal Kumar Juluru
7 min read
Has Summary
--
This article discusses the importance of chunking strategies in AI retrieval systems, particularly in retrieval-augmented generation (RAG) systems.
Steve Han
13 min read
Has Summary
--
The article discusses the implementation of Offline Approximate Nearest Neighbors (ANN) at Pinterest to improve ad retrieval efficiency.
Pinterest Engineering
7 min read
Has Summary
--
The article discusses the latest updates to the Gemini API, highlighting new models and functionalities that enhance developers' ability to create applications using generative AI.
The article discusses JUDE, LinkedIn's platform for generating high-quality embeddings for job recommendations using fine-tuned Large Language Models (LLMs).
BERTEmbeddingHugging FaceKubernetesLarge Language ModelsMistralPyTorchTransfer LearningTransformerTransformers
Nikita Zhiltsov
13 min read
Has Summary
--
NVIDIA's cuEmbed is a high-performance, header-only CUDA library designed to accelerate embedding lookups on NVIDIA GPUs, particularly beneficial for recommendation systems.
The article introduces Keras Recommenders, a new library designed to simplify the creation of state-of-the-art recommendation systems using Keras with JAX, TensorFlow, or PyTorch.
The article discusses the new features and improvements in Gemma 3, highlighting its vision-language capabilities, architectural changes for memory efficiency, and enhanced multilingual support.
Ju-yeong Ji, Ravin Kumar
9 min read
Includes Code
Has Summary
--
The article discusses how Qodo leverages NVIDIA DGX to innovate efficient code search through AI-powered agents.
Amit Bleiweiss
7 min read
Has Summary
--
The article discusses the development of an AI-powered tool for automatic citation validation using NVIDIA NIM, aimed at improving the accuracy of citations in academic and AI-generated content.
Sebastian Haan
8 min read
Has Summary
--
The article introduces AutoRAG, a fully managed Retrieval-Augmented Generation (RAG) pipeline available in open beta on Cloudflare.
Anni Wang
11 min read
Includes Code
Has Summary
--
The article discusses the evaluation and enhancement of Retrieval-Augmented Generation (RAG) pipeline performance using synthetic data.
Vinay Raman
11 min read
Includes Code
Has Summary
--
This article discusses how Uber enhances personalized CRM communication using contextual bandit strategies, particularly focusing on the application of AI/ML techniques to optimize email content.
LJ (Lin) He, Yifeng Wu, Gaurav Jindal
13 min read
Has Summary
--
The article explores the emerging field of cryptographic watermarking for AI-generated content, discussing its importance in identifying the origins of digital artifacts.
Teresa Brooks-Mejia
24 min read
Includes Code
Has Summary
--
The article discusses the advancements in NVIDIA's NeMo Retriever, which enables accurate multimodal PDF data extraction at a speed 15 times faster than traditional methods.
Ruchika Kharwar
10 min read
Has Summary
--
The article discusses the development of Airbnb's first Embedding-Based Retrieval (EBR) search system, which aims to improve the relevance of search results for users by narrowing down the pool of ...
Huiji Gao
7 min read
Has Summary
--
This article provides an in-depth understanding of Parallel Thread Execution (PTX), the assembly language for NVIDIA's CUDA GPU computing platform.
The article discusses the introduction of the Gemini Embedding text model (gemini-embedding-exp-03-07) available through the Gemini API.
The article discusses the integration of NVIDIA ACE AI characters into games using the new In-Game Inferencing SDK (NVIGI).
The article discusses advancements in embedding-based retrieval at Pinterest's Homefeed, focusing on improvements such as feature crossing, ID embeddings, and serving corpus upgrades.
Pinterest Engineering
8 min read
Has Summary
--