On April 23, OpenAI launched GPT-5.5, its flagship language model. According to the official press release, it surpasses GPT-5.4 in agentic coding, data processing, scientific research, and computer control.
The model is currently available to ChatGPT users on the Plus, Pro, Business, and Enterprise plans, as well as in the Codex environment.
What Has Changed
The key difference from previous versions is that the model takes on multi-step tasks without pulling the user back in at every turn. You can assign a vague task with several conditions, and GPT-5.5 will plan the steps, use the necessary tools, verify the result, and continue working until completion. Previously, models would stop at certain stages and wait for clarifications.
This shows up in coding and day-to-day work alike: data analysis, document drafting, spreadsheet management, and navigating application interfaces. The model sees the screen, clicks, types, and switches between tools — all without being asked.
Benchmarks
On Terminal-Bench 2.0, which tests complex multi-stage scenarios, GPT-5.5 scores 82.7% compared to 75.1% for GPT-5.4 and 69.4% for Claude Opus 4.7. On OSWorld-Verified, which evaluates the ability to operate in real computer environments, the result is 78.7%, outperforming Claude Opus 4.7.
Early testers described the model as the first to demonstrate "genuine conceptual clarity" when navigating large codebases. It independently locates where edits are needed and tracks the consequences of changes across the entire project.
Potential in Science
An internal version of GPT-5.5 helped discover a new mathematical proof in Ramsey theory, an area of combinatorics where new results rarely appear and require serious expertise. The proof was later verified in Lean.
Cybersecurity
OpenAI assigned a High rating to the cybersecurity capabilities of GPT-5.5 under its own Preparedness Framework. This rating is higher than GPT-5.4 but still below the critical level.
The company implemented stricter filters for potentially dangerous requests, admitting that these filters might initially annoy some users.
