Introducing GPT-5.3-Codex

OpenAI

Expanding Codex across the full spectrum of professional work on a computer.

OpenAI

•

OpenAI

•11 min read•intermediate•

--

•View Original

GPTNext.js

Overview

OpenAI introduces GPT-5.3-Codex, their most capable agentic coding model that combines frontier coding performance from GPT-5.2-Codex with reasoning and professional knowledge capabilities of GPT-5.2, while being 25% faster. The model extends beyond code generation to handle the full spectrum of professional computer work including debugging, deploying, monitoring, writing PRDs, creating presentations, and analyzing data. Notably, GPT-5.3-Codex is the first model that was instrumental in creating itself, with the Codex team using early versions to debug training, manage deployment, and diagnose evaluations.

What You'll Learn

1

What GPT-5.3-Codex can do beyond code generation, including research, tool use, and complex execution tasks

2

How GPT-5.3-Codex performs on key industry benchmarks like SWE-Bench Pro, Terminal-Bench, OSWorld, and GDPval

3

How OpenAI used Codex to accelerate the training and deployment of GPT-5.3-Codex itself

4

What cybersecurity safeguards OpenAI deployed for a model classified as 'High capability' for cybersecurity tasks

5

How interactive steering allows real-time collaboration with the model during long-running tasks

Prerequisites & Requirements

Familiarity with AI coding assistants and agentic AI concepts
Understanding of software engineering workflows (debugging, deploying, testing)(optional)
Paid ChatGPT plan for access to Codex app, CLI, IDE extension, or web

Key Questions Answered

What is GPT-5.3-Codex and how does it differ from GPT-5.2-Codex?

GPT-5.3-Codex is OpenAI's most capable agentic coding model that combines the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2 in a single model that is 25% faster. It can handle long-running tasks involving research, tool use, and complex execution, and supports real-time interactive steering during task execution.

How does GPT-5.3-Codex perform on SWE-Bench Pro and other coding benchmarks?

GPT-5.3-Codex achieves state-of-the-art performance with 56.8% on SWE-Bench Pro (Public), 77.3% on Terminal-Bench 2.0, 64.7% on OSWorld-Verified, and 70.9% on GDPval. It notably achieves these results with fewer tokens than any prior model. The Terminal-Bench 2.0 and OSWorld-Verified scores represent massive improvements over GPT-5.2-Codex's 64.0% and 38.2% respectively.

Can GPT-5.3-Codex do more than just write code?

Yes, GPT-5.3-Codex supports the full software lifecycle including debugging, deploying, monitoring, writing PRDs, editing copy, user research, tests, and metrics. Its agentic capabilities extend beyond software to creating slide decks, analyzing data in spreadsheets, building presentations, and completing productivity tasks in visual desktop environments as demonstrated by its strong OSWorld performance.

How did OpenAI use GPT-5.3-Codex to build itself?

GPT-5.3-Codex is the first model instrumental in creating itself. The research team used it to monitor and debug the training run, track patterns during training, and analyze interaction quality. The engineering team used it to optimize the harness, identify context rendering bugs, root cause low cache hit rates, dynamically scale GPU clusters, and keep latency stable during launch.

What cybersecurity concerns does OpenAI have about GPT-5.3-Codex?

GPT-5.3-Codex is the first model OpenAI classifies as 'High capability' for cybersecurity tasks under their Preparedness Framework, and the first directly trained to identify software vulnerabilities. While there's no definitive evidence it can automate cyber attacks end-to-end, OpenAI deployed comprehensive safeguards including safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines.

What is the Trusted Access for Cyber program OpenAI launched with GPT-5.3-Codex?

Trusted Access for Cyber is a pilot program launched alongside GPT-5.3-Codex to accelerate cyber defense research. It is part of OpenAI's broader cybersecurity strategy that includes expanding the private beta of Aardvark (their security research agent), partnering with open-source maintainers for free codebase scanning, and committing $10M in API credits for cyber defense, especially for open source and critical infrastructure.

Where is GPT-5.3-Codex available and how can developers access it?

GPT-5.3-Codex is available with paid ChatGPT plans across all Codex surfaces: the app, CLI, IDE extension, and web. OpenAI is working to safely enable API access soon. The model runs on NVIDIA GB200 NVL72 systems and is 25% faster than its predecessor thanks to infrastructure and inference stack improvements.

How does interactive steering work with GPT-5.3-Codex in the Codex app?

GPT-5.3-Codex provides frequent updates during task execution so users stay informed of key decisions and progress. Instead of waiting for final output, users can interact in real time—asking questions, discussing approaches, and steering toward solutions. The feature can be enabled in Settings > General > Follow-up behavior within the Codex app.

Key Statistics & Figures

SWE-Bench Pro (Public)

56.8%

GPT-5.3-Codex with xhigh reasoning effort, state-of-the-art performance

Terminal-Bench 2.0

77.3%

GPT-5.3-Codex vs 64.0% for GPT-5.2-Codex

OSWorld-Verified

64.7%

GPT-5.3-Codex vs 38.2% for GPT-5.2-Codex; humans score ~72%

GDPval (wins or ties)

70.9%

Measures performance on knowledge work tasks across 44 occupations

Cybersecurity Capture The Flag Challenges

77.6%

GPT-5.3-Codex vs 67.4% for GPT-5.2-Codex

SWE-Lancer IC Diamond

81.4%

GPT-5.3-Codex vs 76.0% for GPT-5.2-Codex

Speed improvement over predecessor

25%

Thanks to improvements in infrastructure and inference stack

Cybersecurity Grant Program commitment

$10M in API credits

For accelerating cyber defense, especially open source and critical infrastructure

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

AI Model

Gpt-5.3-codex

Primary model announced, combining coding and reasoning capabilities

AI Model

Gpt-5.2-codex

Predecessor model used for comparison benchmarks

AI Model

Gpt-5.2

Base reasoning model whose capabilities were merged into GPT-5.3-Codex

Hardware

Nvidia Gb200 Nvl72

GPU systems used for co-designing, training, and serving GPT-5.3-Codex

Development Tool

Codex App

Primary interface for interacting with GPT-5.3-Codex (app, CLI, IDE extension, web)

Security Tool

Aardvark

OpenAI's security research agent, first offering in Codex Security products

Framework

Next.js

Mentioned as open-source project where Codex-assisted security researcher found vulnerabilities

Benchmark

Swe-bench Pro

Multi-language software engineering evaluation spanning four programming languages

Benchmark

Terminal-bench 2.0

Measures terminal skills needed by coding agents

Benchmark

Osworld

Agentic computer-use benchmark for visual desktop productivity tasks

Benchmark

Gdpval

Evaluation measuring performance on knowledge work tasks across 44 occupations

Key Actionable Insights

1
GPT-5.3-Codex can be used as an interactive collaborator for long-running tasks, not just a one-shot code generator. Enable steering in the Codex app settings to ask questions, discuss approaches, and redirect the model while it works, similar to working with a colleague.
This is available via Settings > General > Follow-up behavior in the Codex app, and is especially useful for complex multi-step tasks where course correction is needed.

2
The model achieves strong results with fewer tokens than any prior model on coding benchmarks, meaning users can accomplish more within their token budgets. This efficiency gain, combined with the 25% speed improvement, makes it practical for iterative development workflows.
Token efficiency is particularly valuable for teams with usage limits or cost constraints on AI-assisted development.

3
GPT-5.3-Codex can handle professional knowledge work beyond coding, including creating presentations, spreadsheets, training documents, and data analysis. Teams should consider it for the full software lifecycle—PRDs, copy editing, user research, metrics analysis—not just code generation.
The model scored 70.9% on GDPval which measures performance across 44 occupations on well-specified knowledge work tasks.

4
OpenAI's internal teams found that using Codex to build data pipelines, visualize results, and co-analyze data points provided insights much more richly than standard dashboarding tools. Development teams should explore using GPT-5.3-Codex for data analysis and custom tooling creation alongside coding tasks.
A data scientist at OpenAI co-analyzed thousands of data points with Codex and got concise summaries of key insights in under three minutes.

5
Security-conscious organizations should apply for OpenAI's expanded $10M Cybersecurity Grant Program for API credits to use the most capable models for defensive security research, especially for open source software and critical infrastructure systems.
OpenAI is also launching Aardvark as a security research agent and partnering with open-source maintainers for free codebase scanning, as demonstrated by recent Next.js vulnerability discoveries.

Common Pitfalls

1

Treating GPT-5.3-Codex as only a code generation tool rather than a full-spectrum professional work agent. The model's capabilities extend to research, tool use, data analysis, presentations, and complex execution tasks, and using it only for writing code misses the broader value.

The article emphasizes this is a transition from 'an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer.'

2

Not enabling interactive steering when working on complex, long-running tasks. Without steering enabled, users miss the opportunity to redirect the model mid-task, potentially wasting computation on approaches that don't align with their intent.

Enable via Settings > General > Follow-up behavior in the Codex app to interact with the model in real time.

3

Assuming the model's cybersecurity capabilities are only beneficial—GPT-5.3-Codex is the first model classified as 'High capability' for cybersecurity under OpenAI's Preparedness Framework and the first trained to identify software vulnerabilities, making it inherently dual-use.

OpenAI has deployed comprehensive mitigations including safety training, automated monitoring, and trusted access controls, and launched the Trusted Access for Cyber pilot program.

Related Concepts

Agentic AI

Ai-assisted Software Engineering

Swe-bench Pro Benchmarks

AI Cybersecurity Capabilities

Computer Use Agents

Self-improving AI Models

AI Coding Assistants

Knowledge Work Automation

AI Safety And Preparedness Frameworks

Dual-use AI Technology

Llm Inference Optimization

Interactive AI Collaboration