Unlocking the Codex harness: how we built the App Server

Celia Chen

By Celia Chen, Member of the Technical Staff

OpenAI

•

Celia Chen

•14 min read•intermediate•

--

•View Original

JSONKotlinPostgreSQLRustSwiftTypeScriptWebSocket

Overview

This article explains how OpenAI built the Codex App Server, a bidirectional JSON-RPC API that serves as the critical link between the Codex harness (agent loop and logic) and various client surfaces including the web app, CLI, IDE extensions, and the macOS desktop app. It covers the architecture, protocol design with conversation primitives (items, turns, threads), and integration patterns for local apps, web runtime, and the TUI.

What You'll Learn

1

How the Codex App Server architecture exposes the Codex harness to clients via a bidirectional JSON-RPC protocol

2

How to design conversation primitives (items, turns, threads) for agent-based interaction protocols

3

When to choose between App Server, MCP server, Codex Exec, and Codex SDK integration methods

4

How different client surfaces (IDEs, web apps, CLI) integrate with a shared agent harness via stdio and JSON-RPC

5

Why backward-compatible protocol design matters for evolving agent APIs across multiple client release cycles

Prerequisites & Requirements

Understanding of JSON-RPC or similar request/response protocols
Familiarity with agent loops and LLM-based coding assistants
Basic understanding of stdio-based inter-process communication(optional)
Experience with IDE extension development or client-server architectures(optional)
Familiarity with OpenAI Codex CLI(optional)

Key Questions Answered

What is the Codex App Server and why was it built?

The Codex App Server is a bidirectional JSON-RPC API that exposes the Codex harness (agent loop, thread management, tool execution) to client applications. It was initially built as a practical way to reuse the Codex harness across the VS Code extension without re-implementing the agent loop, and evolved into a stable protocol that powers all Codex surfaces including the web app, CLI, IDE extensions, and desktop app.

What are the main components of the Codex App Server architecture?

The App Server process has four main components: the stdio reader (handles incoming JSON-RPC messages), the Codex message processor (translates between client requests and core operations), the thread manager (spins up one core session per thread), and core threads (individual agent sessions). The message processor listens to Codex core's internal event stream and transforms low-level events into stable, UI-ready JSON-RPC notifications.

What are the conversation primitives in the Codex App Server protocol?

The protocol uses three core primitives: Items (atomic units of input/output with lifecycle events: started, delta, completed), Turns (one unit of agent work initiated by user input containing a sequence of items), and Threads (durable containers for ongoing sessions that can be created, resumed, forked, and archived with persisted history for reconnection).

How do local IDE integrations connect to the Codex App Server?

Local clients bundle or fetch a platform-specific App Server binary, launch it as a long-running child process, and maintain a bidirectional stdio channel for JSON-RPC communication. Some partners like Xcode decouple release cycles by keeping the client stable while pointing to newer App Server binaries, allowing them to adopt server-side improvements without waiting for a client release.

How does Codex Web handle ephemeral browser sessions with the App Server?

Codex Web runs the harness in a container environment where a worker provisions a container with the workspace, launches the App Server binary inside it, and maintains a JSON-RPC over stdio channel. The web app communicates via HTTP and SSE. State and progress are kept server-side so work continues even if the browser tab closes, and new sessions can reconnect and catch up without rebuilding state.

Why did OpenAI choose JSON-RPC over MCP for the Codex App Server?

OpenAI initially experimented with exposing Codex as an MCP server but found that maintaining MCP semantics in a way that made sense for VS Code proved difficult. Rich interaction patterns like workspace exploration, streaming progress, and emitting diffs required richer session semantics that didn't map cleanly through MCP endpoints, so they introduced a custom JSON-RPC protocol that mirrored the TUI loop.

How does the approval flow work in the Codex App Server protocol?

When the agent needs user input like an approval before executing an action, the server sends a server-initiated request (item/commandExecution/requestApproval) with a reason description. This pauses the turn until the client responds with either 'allow' or 'deny'. In VS Code, this renders as a permission prompt asking if the user wants to allow the command to run.

How can developers generate client bindings for the Codex App Server?

For TypeScript, developers can generate type definitions directly from the Rust protocol by running 'codex app-server generate-ts'. For other languages, a JSON Schema bundle can be generated with 'codex app-server generate-json-schema' and fed into any code generator. Teams have implemented clients in Go, Python, TypeScript, Swift, and Kotlin.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Protocol

Json-rpc

Bidirectional communication protocol between clients and the Codex App Server, using a 'JSON-RPC lite' variant framed as JSONL over stdio

Language

Rust

Implementation language for Codex core and the App Server, with protocol types defined in Rust that can be exported to JSON Schema or TypeScript

Language

Typescript

Client binding generation from Rust protocol types; Codex SDK is a TypeScript library for programmatic agent control

IDE

Vs Code

First IDE integration using the App Server protocol, bundling a platform-specific Codex binary

Protocol

Mcp

Alternative integration method via 'codex mcp-server' for MCP-based workflows, with more limited Codex functionality

Protocol

Sse

Server-Sent Events used in Codex Web to stream task events from the backend to the browser

Language

Go

One of the languages used by partner teams to implement App Server client bindings

Language

Swift

Used for Xcode integration client implementation

Language

Kotlin

Used for JetBrains IDE integration client implementation

Language

Python

One of the languages used to implement App Server client bindings

Key Actionable Insights

1
Use the App Server's JSON-RPC protocol as the primary integration method when embedding Codex into your product, as it provides the full Codex harness functionality including thread management, tool execution, approval flows, and streaming events. This is the first-class integration method OpenAI will maintain going forward.
The App Server protocol is designed to be backward compatible, meaning older clients can safely communicate with newer server versions without breaking changes.

2
Design agent interaction protocols around three clear primitives: items (atomic I/O units), turns (user-initiated work units), and threads (durable session containers). This separation makes the interaction stream easy to integrate and resilient across different UI surfaces, from terminal to web to IDE.
Each item has an explicit lifecycle (started, delta, completed) that lets clients start rendering immediately, stream incremental updates, and finalize cleanly, enabling rich real-time UIs.

3
Decouple your App Server binary version from client release cycles to enable faster iteration. Partners like Xcode keep the client stable and allow it to point to newer App Server binaries, enabling server-side improvements and bug fixes without waiting for a client release.
This works because the JSON-RPC surface is designed for backward compatibility, so older clients can talk to newer servers safely.

4
When building web-based agent interfaces, keep state and progress on the server side rather than in the browser. This ensures that long-running tasks continue even if the browser tab closes or network drops, and new sessions can reconnect and catch up without rebuilding state.
Codex Web uses containers with the App Server binary running inside, communicating via HTTP and SSE to the browser, making the client-side UI lightweight.

5
Use the 'codex app-server generate-ts' or 'codex app-server generate-json-schema' commands to auto-generate client bindings rather than hand-coding JSON-RPC types. Teams have successfully built integrations in Go, Python, TypeScript, Swift, and Kotlin using these generated definitions.
Many teams were able to quickly achieve a working integration by feeding the JSON schema and documentation directly to Codex itself for generating the client code.

6
Consider using Codex as an MCP server (via 'codex mcp-server') if you already have an MCP-based workflow and need Codex as a callable tool, but be aware that Codex-specific interactions relying on richer session semantics like diff updates may not map cleanly through MCP endpoints.
For the full agent experience including sign-in, model discovery, configuration management, and streaming events, the App Server protocol is the recommended approach over MCP.

Common Pitfalls

1

Attempting to use MCP semantics for rich IDE-grade agent interactions. OpenAI initially tried exposing Codex as an MCP server for VS Code but found that maintaining MCP semantics in a way that made sense was difficult, especially for streaming progress, workspace exploration, and diff emission.

The MCP protocol converges on the common subset of capabilities, making richer provider-specific interactions harder to represent. Use the native App Server protocol for full Codex harness functionality.

2

Treating agent interaction as simple request/response. One user request can unfold into a structured sequence of actions including incremental progress updates, tool executions, approval requests, and artifacts like diffs. Designing for just request/response will miss critical agent behaviors.

The App Server uses a streaming event model with items, turns, and threads to represent the full agent lifecycle, including bidirectional communication where the server can pause a turn to request client approval.

3

Storing agent state in the browser for web-based integrations. Web sessions are ephemeral—tabs close and networks drop—so the browser cannot be the source of truth for long-running agent tasks. If state is only maintained client-side, tasks will be lost on disconnection.

Keep state and progress on the server side. The streaming protocol and saved thread sessions enable new sessions to reconnect, pick up where they left off, and catch up without rebuilding state.

4

Tightly coupling App Server binary versions to client release cycles. If your client and server must be released together, you cannot adopt server-side improvements (like better auto-compaction or bug fixes) without waiting for a full client release.

Design for version decoupling by leveraging the App Server's backward-compatible JSON-RPC surface, allowing older clients to communicate safely with newer servers.

Related Concepts

Agent Loop Design

Json-rpc Protocol

Mcp (model Context Protocol)

Bidirectional Streaming Protocols

Stdio-based Ipc

IDE Extension Architecture

Server-sent Events (sse)

Protocol Versioning And Backward Compatibility

Sandboxed Tool Execution

Thread Persistence And Session Management

Cross-provider Agent Harness Protocols

Openai Agents SDK