Introducing GPT-5.3-Codex

Expanding Codex across the full spectrum of professional work on a computer.

OpenAI
11 min readintermediate
--
View Original

Overview

OpenAI introduces GPT-5.3-Codex, their most capable agentic coding model that combines frontier coding performance from GPT-5.2-Codex with reasoning and professional knowledge capabilities of GPT-5.2, while being 25% faster. The model extends beyond code generation to handle the full spectrum of professional computer work including debugging, deploying, monitoring, writing PRDs, creating presentations, and analyzing data. Notably, GPT-5.3-Codex is the first model that was instrumental in creating itself, with the Codex team using early versions to debug training, manage deployment, and diagnose evaluations.

What You'll Learn

1

What GPT-5.3-Codex can do beyond code generation, including research, tool use, and complex execution tasks

2

How GPT-5.3-Codex performs on key industry benchmarks like SWE-Bench Pro, Terminal-Bench, OSWorld, and GDPval

3

How OpenAI used Codex to accelerate the training and deployment of GPT-5.3-Codex itself

4

What cybersecurity safeguards OpenAI deployed for a model classified as 'High capability' for cybersecurity tasks

5

How interactive steering allows real-time collaboration with the model during long-running tasks

Prerequisites & Requirements

  • Familiarity with AI coding assistants and agentic AI concepts
  • Understanding of software engineering workflows (debugging, deploying, testing)(optional)
  • Paid ChatGPT plan for access to Codex app, CLI, IDE extension, or web

Key Questions Answered

What is GPT-5.3-Codex and how does it differ from GPT-5.2-Codex?
GPT-5.3-Codex is OpenAI's most capable agentic coding model that combines the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2 in a single model that is 25% faster. It can handle long-running tasks involving research, tool use, and complex execution, and supports real-time interactive steering during task execution.
How does GPT-5.3-Codex perform on SWE-Bench Pro and other coding benchmarks?
GPT-5.3-Codex achieves state-of-the-art performance with 56.8% on SWE-Bench Pro (Public), 77.3% on Terminal-Bench 2.0, 64.7% on OSWorld-Verified, and 70.9% on GDPval. It notably achieves these results with fewer tokens than any prior model. The Terminal-Bench 2.0 and OSWorld-Verified scores represent massive improvements over GPT-5.2-Codex's 64.0% and 38.2% respectively.
Can GPT-5.3-Codex do more than just write code?
Yes, GPT-5.3-Codex supports the full software lifecycle including debugging, deploying, monitoring, writing PRDs, editing copy, user research, tests, and metrics. Its agentic capabilities extend beyond software to creating slide decks, analyzing data in spreadsheets, building presentations, and completing productivity tasks in visual desktop environments as demonstrated by its strong OSWorld performance.
How did OpenAI use GPT-5.3-Codex to build itself?
GPT-5.3-Codex is the first model instrumental in creating itself. The research team used it to monitor and debug the training run, track patterns during training, and analyze interaction quality. The engineering team used it to optimize the harness, identify context rendering bugs, root cause low cache hit rates, dynamically scale GPU clusters, and keep latency stable during launch.
What cybersecurity concerns does OpenAI have about GPT-5.3-Codex?
GPT-5.3-Codex is the first model OpenAI classifies as 'High capability' for cybersecurity tasks under their Preparedness Framework, and the first directly trained to identify software vulnerabilities. While there's no definitive evidence it can automate cyber attacks end-to-end, OpenAI deployed comprehensive safeguards including safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines.
What is the Trusted Access for Cyber program OpenAI launched with GPT-5.3-Codex?
Trusted Access for Cyber is a pilot program launched alongside GPT-5.3-Codex to accelerate cyber defense research. It is part of OpenAI's broader cybersecurity strategy that includes expanding the private beta of Aardvark (their security research agent), partnering with open-source maintainers for free codebase scanning, and committing $10M in API credits for cyber defense, especially for open source and critical infrastructure.
Where is GPT-5.3-Codex available and how can developers access it?
GPT-5.3-Codex is available with paid ChatGPT plans across all Codex surfaces: the app, CLI, IDE extension, and web. OpenAI is working to safely enable API access soon. The model runs on NVIDIA GB200 NVL72 systems and is 25% faster than its predecessor thanks to infrastructure and inference stack improvements.
How does interactive steering work with GPT-5.3-Codex in the Codex app?
GPT-5.3-Codex provides frequent updates during task execution so users stay informed of key decisions and progress. Instead of waiting for final output, users can interact in real time—asking questions, discussing approaches, and steering toward solutions. The feature can be enabled in Settings > General > Follow-up behavior within the Codex app.

Key Statistics & Figures

SWE-Bench Pro (Public)
56.8%
GPT-5.3-Codex with xhigh reasoning effort, state-of-the-art performance
Terminal-Bench 2.0
77.3%
GPT-5.3-Codex vs 64.0% for GPT-5.2-Codex
OSWorld-Verified
64.7%
GPT-5.3-Codex vs 38.2% for GPT-5.2-Codex; humans score ~72%
GDPval (wins or ties)
70.9%
Measures performance on knowledge work tasks across 44 occupations
Cybersecurity Capture The Flag Challenges
77.6%
GPT-5.3-Codex vs 67.4% for GPT-5.2-Codex
SWE-Lancer IC Diamond
81.4%
GPT-5.3-Codex vs 76.0% for GPT-5.2-Codex
Speed improvement over predecessor
25%
Thanks to improvements in infrastructure and inference stack
Cybersecurity Grant Program commitment
$10M in API credits
For accelerating cyber defense, especially open source and critical infrastructure

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

AI Model
Gpt-5.3-codex
Primary model announced, combining coding and reasoning capabilities
AI Model
Gpt-5.2-codex
Predecessor model used for comparison benchmarks
AI Model
Gpt-5.2
Base reasoning model whose capabilities were merged into GPT-5.3-Codex
Hardware
Nvidia Gb200 Nvl72
GPU systems used for co-designing, training, and serving GPT-5.3-Codex
Development Tool
Codex App
Primary interface for interacting with GPT-5.3-Codex (app, CLI, IDE extension, web)
Security Tool
Aardvark
OpenAI's security research agent, first offering in Codex Security products
Framework
Next.js
Mentioned as open-source project where Codex-assisted security researcher found vulnerabilities
Benchmark
Swe-bench Pro
Multi-language software engineering evaluation spanning four programming languages
Benchmark
Terminal-bench 2.0
Measures terminal skills needed by coding agents
Benchmark
Osworld
Agentic computer-use benchmark for visual desktop productivity tasks
Benchmark
Gdpval
Evaluation measuring performance on knowledge work tasks across 44 occupations

Key Actionable Insights

1
GPT-5.3-Codex can be used as an interactive collaborator for long-running tasks, not just a one-shot code generator. Enable steering in the Codex app settings to ask questions, discuss approaches, and redirect the model while it works, similar to working with a colleague.
This is available via Settings > General > Follow-up behavior in the Codex app, and is especially useful for complex multi-step tasks where course correction is needed.
2
The model achieves strong results with fewer tokens than any prior model on coding benchmarks, meaning users can accomplish more within their token budgets. This efficiency gain, combined with the 25% speed improvement, makes it practical for iterative development workflows.
Token efficiency is particularly valuable for teams with usage limits or cost constraints on AI-assisted development.
3
GPT-5.3-Codex can handle professional knowledge work beyond coding, including creating presentations, spreadsheets, training documents, and data analysis. Teams should consider it for the full software lifecycle—PRDs, copy editing, user research, metrics analysis—not just code generation.
The model scored 70.9% on GDPval which measures performance across 44 occupations on well-specified knowledge work tasks.
4
OpenAI's internal teams found that using Codex to build data pipelines, visualize results, and co-analyze data points provided insights much more richly than standard dashboarding tools. Development teams should explore using GPT-5.3-Codex for data analysis and custom tooling creation alongside coding tasks.
A data scientist at OpenAI co-analyzed thousands of data points with Codex and got concise summaries of key insights in under three minutes.
5
Security-conscious organizations should apply for OpenAI's expanded $10M Cybersecurity Grant Program for API credits to use the most capable models for defensive security research, especially for open source software and critical infrastructure systems.
OpenAI is also launching Aardvark as a security research agent and partnering with open-source maintainers for free codebase scanning, as demonstrated by recent Next.js vulnerability discoveries.

Common Pitfalls

1
Treating GPT-5.3-Codex as only a code generation tool rather than a full-spectrum professional work agent. The model's capabilities extend to research, tool use, data analysis, presentations, and complex execution tasks, and using it only for writing code misses the broader value.
The article emphasizes this is a transition from 'an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer.'
2
Not enabling interactive steering when working on complex, long-running tasks. Without steering enabled, users miss the opportunity to redirect the model mid-task, potentially wasting computation on approaches that don't align with their intent.
Enable via Settings > General > Follow-up behavior in the Codex app to interact with the model in real time.
3
Assuming the model's cybersecurity capabilities are only beneficial—GPT-5.3-Codex is the first model classified as 'High capability' for cybersecurity under OpenAI's Preparedness Framework and the first trained to identify software vulnerabilities, making it inherently dual-use.
OpenAI has deployed comprehensive mitigations including safety training, automated monitoring, and trusted access controls, and launched the Trusted Access for Cyber pilot program.

Related Concepts

Agentic AI
Ai-assisted Software Engineering
Swe-bench Pro Benchmarks
AI Cybersecurity Capabilities
Computer Use Agents
Self-improving AI Models
AI Coding Assistants
Knowledge Work Automation
AI Safety And Preparedness Frameworks
Dual-use AI Technology
Llm Inference Optimization
Interactive AI Collaboration