Retrieval Augmented Generation Programming Tutorials &amp; Engineering Articles

MediaTek NPU and LiteRT: Powering the next generation of on-device AI

Intermediate

The article discusses the advancements in on-device AI powered by MediaTek's Neural Processing Unit (NPU) and the introduction of the LiteRT NeuroPilot Accelerator.

GeminiJavaJAXKotlinRetrieval Augmented Generation

Lu Wang, Arian Arfaian, Luke Boyer

10 min read

Includes Code

Has Summary

Google AI Edge Gallery: Now with audio and on Google Play

Beginner

The article discusses the launch of the Google AI Edge Gallery app, which now includes audio capabilities and is available on Google Play.

Generative AIHugging FaceRetrieval Augmented Generation

Alice Zheng, Na Li

3 min read

Has Summary

From Fine-Tuning to Production: A Scalable Embedding Pipeline with Dataflow

Intermediate

This article discusses the integration of Google's EmbeddingGemma model with Google Cloud's Dataflow to create a scalable embedding pipeline for AI applications.

ApacheEmbeddingGeminiGoogle CloudHugging FaceLarge Language ModelsRetrieval Augmented Generation

Danny McCormick, Ian Ballantyne, Olivier Lacombe

5 min read

Includes Code

Has Summary

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

Intermediate

EmbeddingGemma is an innovative open embedding model designed for on-device AI applications, featuring 308 million parameters for efficient performance.

EmbeddingGeminiHugging FaceLangChainOllamaRetrieval Augmented GenerationTransformersVertex AI

Min Choi, Sahil Dua, Alice Lisak

5 min read

Has Summary

Anthropic

Advanced

How we built our multi-agent research system

The article discusses the development of a multi-agent research system, detailing its architecture, benefits, and the lessons learned during its transition from prototype to production.

ClaudeRetrieval Augmented Generation

18 min read

Has Summary

On-device small language models with multimodality, RAG, and Function Calling

Beginner

The article discusses the expansion of Google's AI Edge platform to support on-device small language models (SLMs) with multimodal capabilities, including the introduction of the Gemma 3 and Gemma ...

Hugging FaceRetrieval Augmented Generation

Mark Sherwood, Matthew Chan, Marissa Ikonomidis

6 min read

Has Summary

Startup spotlight: building AI agents and accelerating innovation with Cohort #5

Advanced

The article highlights the innovative approaches taken by startups Lamatic AI and Skyward AI in building AI agent platforms using Cloudflare's infrastructure.

Cloudflare WorkersJavaScriptKubernetesRemixRetrieval Augmented GenerationYAML

Christopher Rotas

12 min read

Includes Code

Has Summary

Workers AI gets a speed boost, batch workload support, more LoRAs, new models, and a refreshed dashboard

Advanced

The article discusses significant improvements to Cloudflare's Workers AI, including enhancements in inference speed, batch workload support, expanded LoRA model support, and a new dashboard.

GPTREST APIRetrieval Augmented Generation

Michelle Chen

12 min read

Includes Code

Has Summary

Meta’s Llama 4 is now available on Workers AI

Advanced

Meta's Llama 4 is now available on the Cloudflare Workers AI platform, offering a powerful, multimodal generative AI model.

Artificial IntelligenceCloudflare WorkersHaskellRetrieval Augmented GenerationTransformer

Michelle Chen

5 min read

Has Summary

Stripe

Intermediate

Stripe’s new AI Assistant in VS Code

Stripe has introduced an AI Assistant integrated into its VS Code extension, designed to enhance developer experience by providing accurate, personalized responses based on Stripe's extensive docum...

ClaudeCopilotRetrieval Augmented Generation

Mathew Varughese

8 min read

Includes Code

Has Summary

Slack

Advanced

How we built enterprise search to be secure and private

The article discusses the development of Slack's enterprise search functionality, emphasizing its security and privacy features.

AWSChefOAuthPythonRetrieval Augmented GenerationTypeScript

Ian Hoffman

7 min read

Has Summary

Palantir

Intermediate

Requirements for AI in Production in Insurance Underwriting

The article discusses the requirements and best practices for deploying AI in production within the insurance underwriting sector.

Artificial IntelligenceCachingCSRFRetrieval Augmented GenerationXSS

Palantir

21 min read

Has Summary

Vertex AI RAG Engine: A developers tool

Advanced

The article discusses the Vertex AI RAG Engine, a tool designed to help developers build grounded generative AI applications by addressing challenges like hallucinations and outdated knowledge.

EmbeddingGenerative AIGoogle CloudLarge Language ModelsRetrieval Augmented GenerationVertex AI

Crispin Velez, Holt Skinner

6 min read

Has Summary

Meta

Advanced

Indexing code at scale with Glean

The article discusses Glean, Meta's open-source code indexing system designed to efficiently collect and manage information about source code.

ErlangHaskellPHPRetrieval Augmented GenerationRustSQLThrift

Simon Marlow

14 min read

Has Summary

Advanced

How we built domain-adapted foundation GenAI models to power our platform

The article discusses the development of domain-adapted foundation GenAI models at LinkedIn, focusing on their application within the Economic Opportunity Network (EON) project.

AzureGenerative AIGPTGPT-4KubernetesMistralReinforcement LearningRetrieval Augmented GenerationRLHF

Praveen Kumar Bodigutla

12 min read

Has Summary

Learn to build and run AI powered apps at Firebase Demo Day ‘24

Beginner

The article discusses Firebase Demo Day 2024, showcasing how to build and run AI-powered applications using Firebase products like Firebase Genkit, Vertex AI, and Firebase App Hosting.

FirebaseGeminiRetrieval Augmented GenerationVertex AI

Yasmin Gehman

4 min read

Has Summary

Building Vectorize, a distributed vector database, on Cloudflare’s Developer Platform

Advanced

The article discusses the development of Vectorize, a distributed vector database built on Cloudflare’s Developer Platform.

Cloudflare WorkersREST APIRetrieval Augmented GenerationRustSQLite

Jérôme Schneider

21 min read

Includes Code

Has Summary

NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1

Intermediate

The article discusses the performance of the NVIDIA GH200 Grace Hopper Superchip in the latest MLPerf Inference v4.

GPTOracleRetrieval Augmented Generation

Amr Elmeleegy

6 min read

Has Summary

Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints

Advanced

The article provides a comprehensive guide on building a Retrieval-Augmented Generation (RAG) pipeline using NVIDIA AI LangChain AI Endpoints.

EmbeddingGenerative AIgRPCHTMLJavaLangChainPythonPyTorchRetrieval Augmented GenerationTensorFlow

Amit Bleiweiss

13 min read

Includes Code

Has Summary

Intermediate

Musings on building a Generative AI product

The article discusses the development of a new AI-powered experience at LinkedIn, focusing on the challenges and successes encountered while building a generative AI product.

Chain of ThoughtEmbeddingGenerative AIJSONRetrieval Augmented GenerationYAML

Juan Pablo Bottaro

13 min read

Has Summary

Slack

Intermediate

How We Built Slack AI To Be Secure and Private

The article discusses the development of Slack AI with a focus on ensuring security and privacy for customer data.

AWSChefRetrieval Augmented Generation

Kelly Moran

9 min read

Has Summary

Intermediate

How we built Text-to-SQL at Pinterest

The article discusses Pinterest's development of a Text-to-SQL feature that utilizes Large Language Models (LLMs) to assist data users in generating SQL queries from natural language questions.

Fine-tuningJSONLarge Language ModelsRetrieval Augmented GenerationSQLWebSocket

Pinterest Engineering

9 min read

Has Summary

Speed Up Your AI Development: NVIDIA AI Workbench Goes GA

Intermediate

NVIDIA AI Workbench is a newly available toolkit designed to streamline AI and ML development for both novice and expert developers.

DockerGenerative AIGitGitLabMistralRetrieval Augmented GenerationStable Diffusion

André Franklin

4 min read

Has Summary

New Workshops and Certification at NVIDIA GTC 2024

Intermediate

The article discusses the new workshops and certification opportunities available at NVIDIA GTC 2024, highlighting both in-person and virtual training sessions.

Computer VisionDiffusion ModelsGenerative AIKubernetesPythonRetrieval Augmented Generation

Ann Sheridan

7 min read

Has Summary

Palantir

Intermediate

Building with Palantir AIP: Logic Tools for RAG/OAG

The article discusses the integration of logic tools within Palantir's Artificial Intelligence Platform (AIP) to enhance Retrieval Augmented Generation (RAG) and Ontology Augmented Generation (OAG)...

Artificial IntelligenceChain of ThoughtRetrieval Augmented Generation

Palantir

9 min read

Has Summary

Palantir

Intermediate

Building with Palantir AIP: Semantic Search

The article discusses how to leverage Palantir AIP to build a semantic search application that uncovers insights from unstructured data within enterprises.

ApacheApache SparkReinforcement LearningRetrieval Augmented GenerationRLHFSemantic Search

Palantir

6 min read

Includes Code

Has Summary

Mastering LLM Techniques: LLMOps

Intermediate

The article discusses the evolution of machine learning operations (MLOps) into specialized areas such as GenAIOps and LLMOps, focusing on the development and management of generative AI and large ...

ChatGPTEmbeddingRetrieval Augmented GenerationRLHF

Nik Spirin

13 min read

Has Summary

Getting Started with Large Language Models for Enterprise Solutions

Intermediate

The article discusses the application of Large Language Models (LLMs) in enterprise solutions, highlighting their capabilities in enhancing productivity across various industries.

ChatGPTEmbeddingGenerative AIGoogle CloudGPTLarge Language ModelsMistralRetrieval Augmented GenerationRLHFStable Diffusion

Erik Pounds

13 min read

Has Summary

Join the First NVIDIA LLM Developer Day: Elevate Your App-Building Skills

Beginner

The NVIDIA LLM Developer Day is a virtual event aimed at developers interested in building applications utilizing Large Language Models (LLMs).

Deep LearningRetrieval Augmented Generation

Pranjali Joshi

2 min read

Has Summary

Intermediate

How LinkedIn Is Using Embeddings to Up Its Match Game for Job Seekers

The article discusses how LinkedIn utilizes embedding-based retrieval (EBR) technology to enhance job matching for seekers.

EmbeddingRetrieval Augmented Generation

Jake Mannix

11 min read

Has Summary