Large Language Models Programming Tutorials &amp; Engineering Articles

Introducing the Developer Knowledge API and MCP Server

Intermediate

Google announces the public preview of the Developer Knowledge API and its associated Model Context Protocol (MCP) server, providing a canonical, machine-readable gateway to Google's official devel...

FirebaseGeminiGoogle CloudLarge Language Models

Jess Kuras

3 min read

Includes Code

Has Summary

Easy FunctionGemma finetuning with Tunix on Google TPUs

Advanced

This tutorial demonstrates how to fine-tune FunctionGemma, a small language model for translating natural language into API calls, using Google's Tunix library on TPUs.

Hugging FaceJAXLarge Language Models

Wei Wei

4 min read

Includes Code

Has Summary

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It Efficient and Accurate

Intermediate

The article discusses the NVIDIA Nemotron 3, a family of open models designed for agentic AI systems, emphasizing its efficiency and accuracy through innovative architectures and techniques.

Hugging FaceLarge Language ModelsReinforcement LearningTransformer

Chris Alexiuk

9 min read

Has Summary

Intermediate

Autonomous Observability at Pinterest (Part 1 of 2)

The article discusses Pinterest's approach to enhancing its observability tools by integrating AI and the Model Context Protocol (MCP).

Streamlining Security Investigations with Agents

Pinterest Engineering

12 min read

Has Summary

Slack

Advanced

Slack's Security Engineering team describes how they built an AI agent-based system to automate and streamline security investigations.

ChefJSONLarge Language Models

Dominic Marks

12 min read

Has Summary

NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks

Intermediate

The NVIDIA Blackwell architecture has achieved the fastest training times across all MLPerf Training v5. 1 benchmarks, showcasing significant advancements in AI training performance.

BERTDeep LearningLarge Language ModelsStable DiffusionTransformerV

Ashraf Eassa

10 min read

Has Summary

Meta

Intermediate

Meta’s Infrastructure Evolution and the Advent of AI

The article discusses Meta's evolution in infrastructure over 21 years, highlighting the significant changes brought about by AI.

ApacheLarge Language ModelsMySQLPrometheusPyTorch

Yee Jiun Song

20 min read

Has Summary

Apigee Operator for Kubernetes and GKE Inference Gateway integration for Auth and AI/LLM policies

Intermediate

The article discusses the integration of the Apigee Operator for Kubernetes with the GKE Inference Gateway to enhance API management for AI and Large Language Models (LLMs).

Artificial IntelligenceGoogle CloudKubernetesLarge Language ModelsOpenAI API

Sanjay Pujare, Jennifer Bennett

4 min read

Includes Code

Has Summary

Building unique, per-customer defenses against advanced bot threats in the AI era

Advanced

The article discusses a new approach to bot management that leverages behavioral anomaly detection tailored for individual customers.

Cloudflare WorkersGenerative AIHTTP/2Large Language Models

Jin-Hee Lee

13 min read

Has Summary

ADK for Java opening up to third-party language models via LangChain4j integration

Intermediate

The article discusses the integration of Google’s Agent Development Kit (ADK) for Java with the LangChain4j LLM framework, enabling developers to utilize a variety of Large Language Models (LLMs) f...

ClaudeDockerGeminiJavaLarge Language ModelsMistralOllamaShellXML

Guillaume Laforge

5 min read

Includes Code

Has Summary

Accelerate Large-Scale LLM Inference and KV Cache Offload with CPU-GPU Memory Sharing

Advanced

The article discusses how to enhance the efficiency of Large Language Models (LLMs) during inference by utilizing CPU-GPU memory sharing through NVIDIA's NVLink C2C technology.

Hugging FaceLarge Language ModelsPythonPyTorch

Afroze Syed

6 min read

Includes Code

Has Summary

From Fine-Tuning to Production: A Scalable Embedding Pipeline with Dataflow

Intermediate

This article discusses the integration of Google's EmbeddingGemma model with Google Cloud's Dataflow to create a scalable embedding pipeline for AI applications.

ApacheEmbeddingGeminiGoogle CloudHugging FaceLarge Language ModelsRetrieval Augmented Generation

Danny McCormick, Ian Ballantyne, Olivier Lacombe

5 min read

Includes Code

Has Summary

Block unsafe prompts targeting your LLM endpoints with Firewall for AI

Advanced

The article discusses Cloudflare's introduction of unsafe content moderation integrated into its Firewall for AI, aimed at protecting Large Language Models (LLMs) from malicious prompts that could ...

GeminiLarge Language ModelsRate Limiting

Radwa Radwan

8 min read

Includes Code

Has Summary

Artificial IntelligenceLarge Language Models

Intermediate

Welcome to AI Week 2025

The article discusses the transformative impact of AI on various industries and introduces Cloudflare's AI Week 2025, focusing on enhancing security and control over AI technologies.

Kenny Johnson

7 min read

Has Summary

Train a GPT2 model with JAX on TPU for free

Advanced

This article provides a comprehensive guide on how to train a GPT-2 model using JAX on TPU, highlighting the ease of leveraging Google TPUs for free.

FlaxGPTJAXLarge Language ModelsMulti-Head AttentionPyTorchTensorFlow

Wei Wei

8 min read

Includes Code

Has Summary

Partnering with OpenAI to bring their new open models onto Cloudflare Workers AI

Intermediate

Cloudflare has partnered with OpenAI to integrate their new open-weight models into Cloudflare Workers AI, allowing developers to leverage these models for enhanced AI capabilities.

Cloudflare WorkersLarge Language ModelsREST API

Michelle Chen

4 min read

Includes Code

Has Summary

Shopify

Intermediate

Leveraging Multimodal LLMs for Shopify’s Global Catalogue: Recap of Expo Talk at ICLR 2025

The article discusses Shopify's Global Catalogue, which utilizes multimodal Large Language Models (LLMs) to standardize and enrich product data across its platform.

Active LearningFine-tuningGeminiLarge Language ModelsLLaMA

Audrey-Anne Guindon

13 min read

Has Summary

Announcing GenAI Processors: Build powerful and flexible Gemini applications

Advanced

The article introduces GenAI Processors, an open-source Python library from Google DeepMind aimed at simplifying the development of sophisticated AI applications using Large Language Models (LLMs).

GeminiJSONLarge Language Models

Andre Elisseeff, Alexey Guseynov, Oskar Bunyan, Shrestha Basu Mallick

6 min read

Includes Code

Has Summary

The crawl before the fall… of referrals: understanding AI’s impact on content providers

Intermediate

The article explores the changing dynamics of web crawling and referral traffic due to the rise of AI and Large Language Models (LLMs).

Artificial IntelligenceClaudeHTMLLarge Language ModelsMistral

David Belson

8 min read

Includes Code

Has Summary

ClickHouse

Beginner

Building an agentic app with ClickHouse MCP and CopilotKit

This article discusses how to build an agentic application using ClickHouse MCP Server and CopilotKit, focusing on creating a customizable analytics dashboard for the UK real estate market.

ClaudeLarge Language ModelsNext.jsReactSQL

Lionel Palacin

10 min read

Includes Code

Has Summary

Advanced

JUDE: LLM-based representation learning for LinkedIn job recommendations

The article discusses JUDE, LinkedIn's platform for generating high-quality embeddings for job recommendations using fine-tuned Large Language Models (LLMs).

BERTEmbeddingHugging FaceKubernetesLarge Language ModelsMistralPyTorchTransfer LearningTransformerTransformers

Nikita Zhiltsov

13 min read

Has Summary

LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM

Advanced

This article serves as a comprehensive guide for benchmarking Large Language Models (LLMs) using NVIDIA's GenAI-Perf tool alongside NVIDIA NIM.

DockerGenerative AIHugging FaceLarge Language ModelsOpenAI API

Vinh Nguyen

11 min read

Includes Code

Has Summary

Advanced

Improving Pinterest Search Relevance Using Large Language Models

The article discusses the implementation of a Large Language Model (LLM)-based relevance system for Pinterest Search, detailing its technical design, model architecture, and the results from both o...

BERTBLIPHugging FaceLarge Language ModelsMachine LearningRoBERTaSupervised LearningT5

Pinterest Engineering

7 min read

Has Summary

Advanced

Journey of next generation control plane for data systems

The article discusses the evolution of LinkedIn's Nuage control plane, highlighting its transition from a self-service platform to a comprehensive control plane solution for managing data infrastru...

JavaJSONLarge Language ModelsMySQL

Aashish Nagpal

21 min read

Has Summary

Netflix

Intermediate

Foundation Model for Personalized Recommendation

The article discusses Netflix's development of a Foundation Model for Personalized Recommendation, which aims to centralize member preference learning and enhance the efficiency of their recommenda...

GPTKongLarge Language ModelsSupervised LearningTransformer

Netflix Technology Blog

13 min read

Has Summary

Vision Language Model Prompt Engineering Guide for Image and Video Understanding

Advanced

This article provides a comprehensive guide on Vision Language Models (VLMs) and their evolution from single-image understanding to advanced video comprehension.

Fine-tuningJSONLarge Language ModelsPrompt Engineering

Shubham Agrawal

11 min read

Includes Code

Has Summary

Configurable Graph-Based Task Solving with the Marco Multi-AI Agent Framework for Chip Design

Advanced

The article discusses the Marco framework, a configurable graph-based task-solving and multi-AI agent system designed to streamline chip design processes.

Large Language ModelsTransformerV

Mark Ren

8 min read

Has Summary

NVIDIA Deep Learning Institute Releases New Generative AI Teaching Kit

Intermediate

NVIDIA has released a new Generative AI Teaching Kit aimed at enhancing education in generative AI technologies.

Deep LearningDiffusion ModelsGenerative AIGPTLarge Language ModelsTransformer

Joe Bungo

7 min read

Has Summary

Beyond the Chatbot: Agentic AI with Gemma

Intermediate

The article discusses Gemma, a family of lightweight generative AI models, and introduces the concept of Agentic AI, which allows AI to make proactive decisions and utilize external tools.

GeminiLarge Language Models

Ju-yeong Ji

7 min read

Includes Code

Has Summary

Vertex AI RAG Engine: A developers tool

Advanced

The article discusses the Vertex AI RAG Engine, a tool designed to help developers build grounded generative AI applications by addressing challenges like hallucinations and outdated knowledge.

EmbeddingGenerative AIGoogle CloudLarge Language ModelsRetrieval Augmented GenerationVertex AI

Crispin Velez, Holt Skinner

6 min read

Has Summary

Anthropic

Intermediate

Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet

The article discusses the upgraded Claude 3. 5 Sonnet model, which achieved a score of 49% on the SWE-bench Verified benchmark, surpassing the previous state-of-the-art model's score of 45%.

ClaudeLarge Language ModelsMatplotlibXML

15 min read

Includes Code

Has Summary

Data-Efficient Knowledge Distillation for Supervised Fine-Tuning with NVIDIA NeMo-Aligner

Intermediate

The article discusses the implementation of data-efficient knowledge distillation using NVIDIA NeMo-Aligner during supervised fine-tuning (SFT).

CachingLarge Language ModelsNeural Networks

Anna Shors

5 min read

Has Summary

Uber

Intermediate

Introducing the Prompt Engineering Toolkit

The article introduces the Prompt Engineering Toolkit developed by Uber, which aims to streamline the process of creating and managing prompts for Large Language Models (LLMs).

Artificial IntelligenceChain of ThoughtLangChainLarge Language ModelsMachine LearningPrompt Engineering

Sishi Long, Hwamin Kim, Manoj Sureddi

12 min read

Has Summary

Palantir

Intermediate

Ethical AI in Defense Decision Support Systems (Defense AI Ethics, #2)

The article discusses the ethical implications and operational realities of implementing AI Decision Support Systems (AI-DSS) in military contexts.

Artificial IntelligenceLarge Language ModelsMachine Learning

Palantir

18 min read

Has Summary

Slack

Intermediate

Empowering Engineers with AI

The article discusses how Slack is utilizing AI-powered tools to enhance developer productivity and streamline processes.

Amazon BedrockAWSChefJenkinsLarge Language Models

Anirudh Janga

10 min read

Has Summary

Billions and billions (of logs): scaling AI Gateway with the Cloudflare Developer Platform

Advanced

The article discusses the challenges and solutions involved in scaling the AI Gateway on the Cloudflare Developer Platform, specifically focusing on extending log storage capabilities from 30 minut...

Cloudflare WorkersJavaScriptLarge Language ModelsSQLite

Catarina Pires Mota

11 min read

Includes Code

Has Summary

Advanced

Ray Batch Inference at Pinterest (Part 3)

This article discusses the implementation of Ray Batch Inference at Pinterest, highlighting its advantages over previous solutions like Apache Spark and Torch Dataloader.

ApacheApache SparkAWSHugging FaceLarge Language ModelsLLaMAPyTorchRay TuneTensorFlow

Pinterest Engineering

11 min read

Includes Code

Has Summary

Start auditing and controlling the AI models accessing your content

Intermediate

The article discusses Cloudflare's new tools that empower site owners to audit and control how AI models access their content.

Sam Rhea

12 min read

Includes Code

Has Summary

NVIDIA Presents AI Security Expertise at Leading Cybersecurity Conferences

Intermediate

NVIDIA showcased its AI security expertise at the Black Hat USA and DEF CON conferences, focusing on the evolving landscape of AI in cybersecurity.

Deep LearningGenerative AILarge Language ModelsMachine LearningPyTorchXSS

Becca Lynch

8 min read

Has Summary

Deploy Diverse AI Apps with Multi-LoRA Support on RTX AI PCs and Workstations

Intermediate

The article discusses the deployment of diverse AI applications using Multi-LoRA support on NVIDIA RTX AI PCs and workstations.

Large Language ModelsStable Diffusion

Annamalai Chockalingam

9 min read

Includes Code

Has Summary

NVIDIA NVLink and NVIDIA NVSwitch Supercharge Large Language Model Inference

Intermediate

The article discusses how NVIDIA NVLink and NVSwitch enhance the performance of Large Language Model (LLM) inference by enabling efficient multi-GPU computing.

Thinking Outside the (Black) Box (Engineering Responsible AI , #2)

Brian Slechta

7 min read

Has Summary

Palantir

Intermediate

The article discusses the importance of explainability in AI, particularly focusing on Large Language Models (LLMs) and the Chain-of-Thought (CoT) prompting technique.

Generative AILarge Language Models

Palantir

11 min read

Has Summary

Uber

Intermediate

Navigating the LLM Landscape: Uber’s Innovation with GenAI Gateway

The article discusses Uber's GenAI Gateway, a unified platform designed to streamline the integration of Large Language Models (LLMs) across various teams within the company.

DialogflowGPTJavaJSONLangChainLarge Language ModelsOpenAI APIPaLMVertex AI

Tse-Chi Wang, Roopansh Bansal

15 min read

Has Summary

Advancing Security for Large Language Models with NVIDIA GPUs and Edgeless Systems

Advanced

The article discusses the launch of Continuum AI by Edgeless Systems, a generative AI framework that ensures data privacy through confidential computing and NVIDIA H100 GPUs.

AzureChatGPTHTTPSLarge Language Models

Laura Martinez

6 min read

Has Summary

Palantir

Intermediate

Product Reliability Incident Management at Palantir

The article discusses the Product Reliability Incident Management team at Palantir, detailing their proactive and reactive approaches to managing critical incidents across their platforms.

Palantir

10 min read

Has Summary

Demystifying AI Inference Deployments for Trillion Parameter Large Language Models

Advanced

This article explores the complexities of deploying trillion-parameter large language models (LLMs) in production environments, focusing on maximizing throughput and user interactivity.

BERTGPTLarge Language Models

Amr Elmeleegy

13 min read

Has Summary

Seamlessly Deploying a Swarm of LoRA Adapters with NVIDIA NIM

Advanced

The article discusses the deployment of LoRA (Low-Rank Adaptation) fine-tuned models using NVIDIA NIM, highlighting the advantages of customizing large language models (LLMs) for specific tasks.

Hugging FaceLarge Language ModelsLLaMA

Shashank Verma

11 min read

Includes Code

Has Summary

A Simple Guide to Deploying Generative AI with NVIDIA NIM

Advanced

The article provides a comprehensive guide on deploying generative AI using NVIDIA NIM microservices, highlighting its ease of use for enterprise developers in both on-premises and cloud environmen...

Generative AIHaystackHugging FaceLangChainLarge Language ModelsLlamaIndexPython

Hayden Wolff

6 min read

Includes Code

Has Summary