You Should Write An Agent

Thomas Ptacek

There are big ideas in computing that are easy to get your head around. The AWS S3 API. It’s the most important storage technology of the last 20 years, and it’s like boiling water. Other technologies, you need to get your feet on the pedals first.

Fly.io

•

Thomas Ptacek

•12 min read•intermediate•

--

•View Original

AWSAWS S3ClaudeJSONPrompt EngineeringSQLSQLite

Overview

Thomas Ptacek argues that every developer should build an LLM agent to truly understand the technology, demonstrating through progressive Python code examples that a functional agent with tool use can be built in surprisingly few lines of code. The article demystifies agents by showing they're essentially loops around stateless LLM API calls with context management, and makes the case that context engineering is a legitimate programming problem worth exploring.

What You'll Learn

1

How to build a functional LLM agent from scratch using the OpenAI API in under 50 lines of Python

2

How LLM context windows work mechanically as arrays of strings replayed with each API call

3

How to implement tool calling in an agent loop so the LLM can autonomously invoke external commands

4

Why context engineering is a real programming problem involving token allocation, sub-agents, and context segregation

5

Why MCP is unnecessary when building your own agent and how it limits architectural flexibility

Prerequisites & Requirements

Basic understanding of Python programming
Familiarity with HTTP APIs and JSON
OpenAI API access and Python SDK
General awareness of what LLMs are and how they generate text

Key Questions Answered

How do you build an LLM agent from scratch without a framework?

You create a context array (list of message strings), write a function that calls the LLM API passing that context, append each user input and assistant response to the array, and loop. For tool use, you define tools as JSON schemas, check if the LLM response requests a tool call, execute the tool, append the result to context, and call the LLM again. The entire implementation is under 50 lines of Python.

What is an LLM context window and how does it actually work in code?

A context window is simply a list of strings (messages) that you maintain in your application code. Since LLMs are stateless, every API call includes the full conversation history. You append user messages and assistant responses to this array, replaying everything with each call. The 'conversation' is an illusion your code creates—the LLM has no memory between calls.

What is the difference between an LLM chatbot and an LLM agent?

According to Simon Willison's definition cited in the article, an agent is an LLM running in a loop that uses tools. A simple chatbot just passes messages back and forth. An agent adds tool definitions to each LLM call, handles tool call responses by executing the tools and feeding results back, and loops until the LLM produces a final text response rather than another tool call.

How does tool calling work in the OpenAI API for agents?

You pass a JSON tool definition (name, description, parameter schema) with each API call. When the LLM decides a tool is needed, it returns a special function_call response with arguments instead of text. Your code executes the actual function, wraps the result in a function_call_output message with the matching call_id, appends both to context, and calls the LLM again to process the result.

Why is MCP unnecessary for building LLM agents?

MCP is just a plugin interface for tools in applications you don't control, like Claude Code or Cursor. When building your own agent, you directly define tools as JSON schemas and implement their handlers—saving you nothing while removing your ability to control agent architecture, context segregation, and security boundaries. MCP saves at most a couple dozen lines of code while constraining architectural flexibility.

What is context engineering and why does it matter for LLM agents?

Context engineering is the programming problem of managing the fixed token budget in an LLM context window. Every input, output, tool description, and tool result consumes tokens, and past a threshold the system degrades nondeterministically. Solutions include creating sub-agents with separate context arrays, giving each different tools, having agents summarize each other for compression, and building tree structures of agent interactions.

How do sub-agents work in LLM agent architectures?

Sub-agents are trivially implemented as new context arrays with separate calls to the LLM model. Each sub-agent can have different tools and system prompts. They can communicate by passing summaries between contexts, aggregating results, or feeding outputs through the LLM for compression. This segregation helps manage token limits and enables specialized agent behavior without overloading a single context window.

Can an LLM agent autonomously decide which tools to use and how many times?

Yes. The article demonstrates this with a ping tool example: when asked to 'describe connectivity to Google,' the agent autonomously decided to ping google.com, www.google.com, and 8.8.8.8 without any explicit loop or instruction to check multiple endpoints. The developer only gave the LLM permission to ping; the LLM figured out the strategy of testing multiple Google properties on its own.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

AI/ML API

Openai Responses API

Core LLM API used for all agent examples, handling chat completions and tool calling

Programming Language

Python

Implementation language for all agent code examples

AI/ML Model

Gpt-5

LLM model used in the agent examples

Database

Sqlite

Mentioned as an option for persisting agent contexts

Programming Language

Go

Referenced as an alternative implementation language with a linked example repository

Protocol

Mcp

Discussed critically as an unnecessary plugin interface for agent tools

Developer Tool

Claude Code

Referenced as an example of a coding agent that could be replicated by building your own

Data Format

JSON

Used for tool definitions and as a potential interchange format between sub-agents

Key Actionable Insights

1
Build your own LLM agent rather than relying solely on Claude Code or Cursor. The agent loop itself is trivially simple—just a context array, an API call function, and a tool-handling loop. Building it yourself gives you full control over architecture, security boundaries, and context management that pre-built tools abstract away.
The article demonstrates the entire agent can be built in under 50 lines of Python, making the barrier to entry extremely low for any developer with API access.

2
Implement segregated contexts with specific tools for each context rather than cramming all tools into a single context window. This approach solves both the token budget problem (too many tool descriptions eating into available space) and security concerns (limiting what each agent context can access).
The article notes that early adopters became bearish on tools because one context window with many tool descriptions left insufficient token space. Segregated contexts are trivial to implement—just create additional context arrays.

3
Skip MCP and implement tools directly as JSON schemas in your own agent code. MCP only saves a couple dozen lines of code while removing your ability to control agent architecture, and many security horror stories stem from dragging a single-context coding agent into inappropriate use cases via MCP plugins.
This applies when building purpose-built agents for specific tasks like vulnerability scanning, customer service, or data analysis—anywhere you control both the agent and the tools.

4
Treat context engineering as a real programming problem: manage your token budget by summarizing sub-agent outputs, using the LLM itself for on-the-fly compression, and building tree structures of agent interactions. The context window is a fixed resource that degrades nondeterministically when overloaded.
Context engineering would be a mid-December Advent of Code problem—it's genuine programming involving data structure management, not the mystical 'prompt engineering' of crafting personality descriptions.

5
Let the LLM decide strategy rather than writing explicit control flow for every scenario. The ping example shows the agent autonomously chose to test multiple Google properties without any explicit loop. Balance explicit control against LLM-driven exploration—too explicit and you lose emergent problem-solving, too loose and results become unpredictable.
This design tension—titrating the right amount of nondeterminism—is one of the key open engineering problems in agent design that you can only develop intuition for by building agents yourself.

6
Use ground truth verification to prevent agents from lying to themselves about having solved a problem and early-exiting their loops. Connect agent outputs to verifiable results (test suites, actual command outputs, database queries) rather than trusting the LLM's self-assessment.
This is identified as one of the key open problems in agent design. The article's ping example naturally provides ground truth through actual network responses, illustrating the pattern.

Common Pitfalls

1

Cramming too many tool descriptions into a single context window, which consumes tokens and leaves insufficient space for the actual work. Early adopters of agents became bearish on tools entirely because of this problem, not realizing the solution was architectural rather than abandoning tools.

The fix is to use segregated contexts with specific tools assigned to each, which is trivially implemented as separate context arrays with their own LLM calls.

2

Using MCP to drag a single-context-window coding agent into tasks it wasn't designed for, such as customer service queries. This creates security vulnerabilities while saving only a couple dozen lines of code, and removes your ability to architect proper context segregation and tool access controls.

Building your own agent gives you direct control over security boundaries, tool access, and context management—things MCP abstracts away at the cost of architectural flexibility.

3

Treating LLM conversations as stateful when the LLM itself is a stateless black box. Developers who don't understand that every API call replays the full context array will struggle with debugging, context management, and cost optimization.

The 'conversation' is an illusion cast by your code—understanding this is fundamental to making good architectural decisions about context management and sub-agent design.

4

Making agent control flow too explicit, writing loops that feed each file individually or check for every specific vulnerability category. This kills the agent's emergent ability to problem-solve and strategize, as demonstrated by the ping example where the LLM autonomously decided to check multiple Google endpoints.

The key design tension is titrating just the right amount of nondeterminism—too explicit and the agent never surprises you, too loose and it surprises you to death.

5

Not connecting agents to ground truth for verification, which allows the LLM to falsely conclude it has solved a problem and exit its loop early. Without external validation, agents can generate plausible-sounding but incorrect results.

Connect agent outputs to verifiable sources—test results, command outputs, database queries—rather than relying on the LLM's self-assessment of its work.

Related Concepts

Llm Agent Architecture

Context Window Management

Tool Calling Apis

Sub-agent Orchestration

Token Budget Optimization

Prompt Engineering Vs Context Engineering

Mcp (model Context Protocol)

Agent Security And Context Segregation

Nondeterminism In Agent Design

Vulnerability Scanning Agents

Coding Agents

Stochastic Parrots Debate

Multi-agent Communication Patterns