Ollama Programming Tutorials &amp; Engineering Articles

Open Source AI Tool Upgrades Speed Up LLM and Diffusion Models on NVIDIA RTX PCs

Advanced

The article discusses how recent upgrades to open source AI tools enhance the performance of small language models (SLMs) and diffusion models on NVIDIA RTX PCs.

Diffusion ModelsGPTOllamaPyTorch

Annamalai Chockalingam

7 min read

Has Summary

Getting Started with Edge AI on NVIDIA Jetson: LLMs, VLMs, and Foundation Models for Robotics

Intermediate

The article discusses the implementation of Edge AI on the NVIDIA Jetson platform, focusing on the use of Large Language Models (LLMs), Vision Language Models (VLMs), and Foundation Models in robot...

Hugging FaceOllamaWebRTC

Chitoku Yato

9 min read

Includes Code

Has Summary

NVIDIA-Accelerated Mistral 3 Open Models Deliver Efficiency, Accuracy at Any Scale

Advanced

The NVIDIA-accelerated Mistral 3 open model family offers developers and enterprises industry-leading accuracy, efficiency, and customization capabilities.

DockerHugging FaceMistralOllama

Anu Srivastava

6 min read

Has Summary

ADK for Java opening up to third-party language models via LangChain4j integration

Intermediate

The article discusses the integration of Google’s Agent Development Kit (ADK) for Java with the LangChain4j LLM framework, enabling developers to utilize a variety of Large Language Models (LLMs) f...

ClaudeDockerGeminiJavaLarge Language ModelsMistralOllamaShellXML

Guillaume Laforge

5 min read

Includes Code

Has Summary

Announcing Genkit Go 1.0 and Enhanced AI-Assisted Development

Intermediate

The article announces the release of Genkit Go 1. 0, a stable, production-ready open-source AI development framework for the Go ecosystem.

ClaudeFirebaseGeminiJavaScriptJSONOllamaShellVertex AI

Chris Gill, Cameron Balahan

7 min read

Includes Code

Has Summary

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

Intermediate

EmbeddingGemma is an innovative open embedding model designed for on-device AI applications, featuring 308 million parameters for efficient performance.

EmbeddingGeminiHugging FaceLangChainOllamaRetrieval Augmented GenerationTransformersVertex AI

Min Choi, Sahil Dua, Alice Lisak

5 min read

Has Summary

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

Intermediate

The article introduces Gemma 3 270M, a compact AI model designed for hyper-efficient task-specific fine-tuning.

DockerGoogle CloudHugging FaceJAXKerasOllamaTransformersVertex AI

Olivier Lacombe, Kathleen Kenealy, Kat Black, Ravin Kumar, Francesco Visin, Jiageng Zhang

5 min read

Has Summary

NVIDIA Accelerates OpenAI gpt-oss Models Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72

Intermediate

NVIDIA has optimized OpenAI's gpt-oss models for accelerated inference performance on the NVIDIA GB200 NVL72 system, achieving up to 1. 5 million tokens per second (TPS).

DockerHugging FaceOllamaPythonTransformerTransformers

Anu Srivastava

6 min read

Includes Code

Has Summary

OpenAI

Advanced

Introducing gpt-oss

The article introduces gpt-oss, two state-of-the-art open-weight language models, gpt-oss-120b and gpt-oss-20b, which excel in reasoning tasks and are optimized for deployment on consumer hardware.

ApacheAWSAzureEmbeddingGPTHugging FaceOllamaPyTorchRustTransformerVercelWhisper

OpenAI

15 min read

Has Summary

Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX

Intermediate

The article discusses the general availability of Google DeepMind's Gemma 3n on NVIDIA RTX and Jetson platforms, highlighting its capabilities in multi-modal on-device deployment, including audio, ...

Fine-tuningHugging FaceOllama

Anu Srivastava

4 min read

Includes Code

Has Summary

Introducing Gemma 3n: The developer guide

Intermediate

The article introduces Gemma 3n, a mobile-first architecture designed for on-device AI, highlighting its multimodal capabilities and architectural innovations.

DockerGeminiGPTHugging FaceOllamaTransformerTransformersVertex AI

Omar Sanseviero, Ian Ballantyne

9 min read

Includes Code

Has Summary

Integrate and Deploy Tongyi Qwen3 Models into Production Applications with NVIDIA

Advanced

The article discusses the integration and deployment of Alibaba's Tongyi Qwen3 models into production applications using NVIDIA technologies.

Hugging FaceOllamaOpenAI APIPyTorch

Ankit Patel

6 min read

Includes Code

Has Summary

Gemma 3 QAT Models: Bringing state-of-the-Art AI to consumer GPUs

Intermediate

The article discusses the launch of Gemma 3, a state-of-the-art AI model optimized for consumer GPUs through Quantization-Aware Training (QAT).

Hugging FaceOllama

Edouard YVINEC, Phil Culliton

6 min read

Has Summary

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

Advanced

NVIDIA has announced world-record inference performance for the DeepSeek-R1 model using the Blackwell architecture, achieving over 250 tokens per second per user and a maximum throughput of over 30...

CLIPHugging FaceJAXOllamaPythonPyTorchT5TensorFlowTransformer

Ashraf Eassa

13 min read

Has Summary

Introducing Gemma 3: The Developer Guide

Intermediate

Gemma 3 is the latest version of the Gemma open-model family, boasting enhanced capabilities such as multimodality, longer context windows, and improved reasoning.

Hugging FaceJAXOllamaReinforcement LearningRLHFTransformersVertex AI

Omar Sanseviero, Philipp Schmid

5 min read

Includes Code

Has Summary

Safer and Multimodal: Responsible AI with Gemma

Beginner

The article discusses the launch of ShieldGemma 2, a safety content classifier model built on Gemma 3, aimed at detecting harmful content in both synthetic and natural images.

Hugging FaceJAXKerasOllamaTransformers

Dana Kurniawan, Wenjun Zeng, Ryan Mullins

3 min read

Has Summary

NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules

Beginner

NVIDIA JetPack 6. 2 introduces Super Mode for the Jetson Orin Nano and Jetson Orin NX modules, significantly enhancing generative AI performance.

CLIPHugging FaceOllamaTransformers

Shashank Maheshwari

11 min read

Includes Code

Has Summary

NVIDIA Jetson Orin Nano Developer Kit Gets a “Super” Boost

Intermediate

The article discusses the enhancements made to the NVIDIA Jetson Orin Nano Developer Kit, now renamed the Jetson Orin Nano Super Developer Kit, which offers a performance boost of up to 1.

Generative AIHugging FaceOllamaTransformerTransformers

Suhas Hariharapura Sheshadri

10 min read

Includes Code

Has Summary

Accelerating LLMs with llama.cpp on NVIDIA RTX Systems

Intermediate

The article discusses how llama. cpp, an efficient framework for large language model (LLM) inference, can be accelerated on NVIDIA RTX systems.

Hugging FaceOllama

Annamalai Chockalingam

5 min read

Has Summary

Smaller, Safer, More Transparent: Advancing Responsible AI with Gemma

Intermediate

The article discusses the advancements in responsible AI through the introduction of Gemma 2, which includes models with 27 billion and 9 billion parameters, emphasizing safety and accessibility.

Generative AIGoogle CloudGPTHugging FaceJAXKerasKubernetesOllamaVertex AI

Neel Nanda, Tom Lieberum, Ludovic Peran, Kathleen Kenealy

6 min read

Has Summary

Transforming Telco Network Operations Centers with NVIDIA NeMo Retriever and NVIDIA NIM

Intermediate

The article discusses how Infosys leverages NVIDIA NIM and NeMo Retriever to enhance network operations centers (NOCs) for telecom companies.

EmbeddingLangChainMistralOllamaReactVertex AI

Balamurugan Natarajan

7 min read

Has Summary

Introducing Genkit for Go: Build scalable AI-powered apps in Go

Intermediate

Genkit for Go is an open-source framework designed to help developers build scalable AI-powered applications using the Go programming language.

FirebaseGeminiGolangGoogle CloudMistralOllamaSQLVertex AIYAML

Chris Gill, Cameron Balahan

7 min read

Includes Code

Has Summary

Supercharge Generative AI Development with Firebase Genkit, Optimized by NVIDIA RTX GPUs

Intermediate

The article discusses Firebase Genkit, an open-source framework introduced at Google I/O 2024, designed for developers to integrate generative AI into web and mobile applications using models like ...

DockerFirebaseGeminiGenerative AIJavaScriptNode.jsOllamaTypeScript

Ankit Patel

3 min read

Includes Code

Has Summary

Google I/O 2024 recap: Making AI accessible and helpful for every developer

Intermediate

The article recaps the Google I/O 2024 event, highlighting advancements in AI technologies aimed at making AI accessible for developers.

CachingDartFirebaseGeminiGenerative AIGoogle CloudJAXKerasKotlinOllamaPostgreSQLPyTorchTensorFlowWebAssembly

Jeanine Banks

8 min read

Has Summary

Picture This: Open Source AI for Image Description

Intermediate

The article discusses the development of an open-source AI image description service using large language models (LLMs) like LLaVA and tools such as Ollama and PocketBase.

ApacheAWSDartDockerFirebaseGeminiGolangGPTGPT-4JavaScriptOllamaSQLite

Nolan Darilek

12 min read

Includes Code

Has Summary

GPUs on Fly.io are available to everyone!

Intermediate

Fly. io has announced the availability of GPU instances for everyone, enabling users to leverage powerful GPUs for applications like large language models, text transcription, and image generation.

Large Language ModelsOllama

Xe Iaso

2 min read

Includes Code

Has Summary

How Yoko Li makes towns, tamagoes, and tools for local AI

Intermediate

The article discusses Yoko Li's innovative work in AI, focusing on her projects like AI Town and AI Tamago, which utilize emergent behavior and large language models.

JSONLLaMAMidjourneyOllamaReinforcement Learning

Xe Iaso

10 min read

Has Summary

KubernetesLarge Language ModelsOllama

Beginner

Fly.io has GPUs now

Fly. io has announced the availability of GPUs, enabling users to perform AI workloads closer to their users at the edge. The article discusses the capabilities of Fly.

Xe Iaso

6 min read

Includes Code

Has Summary

What are these "GPUs" really?

Advanced

The article explores the nature and capabilities of Graphics Processing Units (GPUs), particularly in the context of AI/ML workloads.

Artificial IntelligenceCrystalGPTLarge Language ModelsMachine LearningMistralOllamaStable DiffusionWhisper

Xe Iaso

13 min read

Has Summary

Scaling Large Language Models to zero with Ollama

Advanced

The article discusses how to scale large language models to zero using Ollama on Fly. io, emphasizing the benefits of self-hosting AI tools and the efficient use of GPU resources.

JavaScriptJSONLarge Language ModelsOllamaServerlessStable DiffusionWhisper

Xe Iaso

11 min read

Includes Code

Has Summary