Introducing GPT-5.4
Key Points
- 1OpenAI has launched GPT-5.4, its new frontier model designed for professional work, integrating advances in reasoning, coding, and agentic workflows.
- 2GPT-5.4 features significant improvements in native computer-use capabilities, visual perception, and factual accuracy, alongside enhanced token efficiency and support for up to 1M tokens of context.
- 3The model achieves state-of-the-art performance across various benchmarks, including knowledge work (GDPval), computer use (OSWorld-Verified), coding (SWE-Bench Pro), and web browsing (BrowseComp), and is available in ChatGPT, the API, and Codex.
This paper announces the release of OpenAI's GPT-5.4, a new frontier model designed for professional work, integrating advancements in reasoning, coding, and agentic workflows. GPT-5.4 is available in ChatGPT (as GPT-5.4 Thinking and GPT-5.4 Pro), the API (as gpt-5.4 and gpt-5.4-pro), and Codex, succeeding GPT-5.3-Codex and GPT-5.2.
Core Methodological Advancements and Technical Details:
- Enhanced Reasoning and Knowledge Work:
- Native Computer-Use and Advanced Vision Capabilities:
- Desktop Navigation: On OSWorld-Verified, which measures a model's ability to navigate a desktop environment using screenshots and keyboard/mouse actions, GPT-5.4 achieves a 75.0% success rate, dramatically surpassing GPT-5.2's 47.3% and even human performance at 72.4%.
- Web Interaction: For browser-based tasks, GPT-5.4 shows improved performance on WebArena-Verified (67.3% with DOM- and screenshot-driven interaction) and Online-Mind2Web (92.8% using screenshot-based observations).
- Visual Perception: These computer-use capabilities are underpinned by improved general visual perception. On MMMU-Pro, a test of visual understanding and reasoning, GPT-5.4 achieves 81.2% success without tools and 82.1% with tools, an improvement over GPT-5.2's scores. Document parsing is also enhanced, with OmniDocBench showing an average error (normalized edit distance) of 0.109 for GPT-5.4, down from 0.140 for GPT-5.2. The model now supports higher image input detail levels: "original" up to 10.24 million pixels or 6000-pixel maximum dimension, and "high" up to 2.56 million pixels or 2048-pixel maximum dimension, improving localization, image understanding, and click accuracy for agents.
- Advanced Coding and Developer Workflows:
- Sophisticated Tool Use and Agentic Orchestration:
- Efficiency and Steerability:
Safety and Deployment:
Treated as "High cyber capability" under OpenAI's Preparedness Framework, GPT-5.4 incorporates an expanded cyber safety stack, including monitoring systems, trusted access controls, and asynchronous blocking. Safety research on Chain-of-Thought (CoT) monitorability indicates that GPT-5.4 Thinking has low CoT controllability, suggesting it lacks the ability to intentionally obfuscate its reasoning, thus validating CoT monitoring as an effective safety tool.
Pricing and Availability:
GPT-5.4 is priced higher than GPT-5.2 to reflect its enhanced capabilities (15/M tokens output for gpt-5.4 vs. 14/M output for gpt-5.2). However, its greater token efficiency aims to reduce overall task costs. GPT-5.4 Pro models (gpt-5.4-pro) are available for maximum performance on complex tasks. It is gradually rolling out across OpenAI's platforms.