Introducing GPT-5.3-Codex-Spark
News

Introducing GPT-5.3-Codex-Spark

2026.02.13
Ā·ServiceĀ·by ź¶Œģ¤€ķ˜ø
#AI#Codex#LLM#OpenAI#Real-time coding

Key Points

  • 1OpenAI introduces GPT-5.3-Codex-Spark, an ultra-fast model optimized for real-time, interactive coding within the Codex app, designed to deliver over 1000 tokens per second for near-instant responses.
  • 2This model runs on Cerebras' Wafer Scale Engine 3, a purpose-built AI accelerator that significantly reduces end-to-end latency and improves responsiveness, complementing existing GPU infrastructure.
  • 3Available as a research preview for ChatGPT Pro users, Codex-Spark is the first in a family of ultra-fast models aimed at blending real-time collaboration with longer-horizon agentic capabilities for software development.

OpenAI's GPT-5.3-Codex-Spark is introduced as an ultra-fast, real-time coding model, representing a smaller, speed-optimized version of GPT-5.3-Codex. This model is the first milestone in OpenAI's partnership with Cerebras, designed to deliver over 1000 tokens per second while maintaining high capability for practical coding tasks.

The core methodology for achieving this near-instantaneous performance combines model-level optimizations with significant infrastructure and hardware advancements. GPT-5.3-Codex-Spark is specifically tuned for speed and interactive work, leading to a lightweight default working style characterized by minimal, targeted edits without automatic test execution unless explicitly requested.

Technically, the model's speed is realized through several synergistic components:

  1. Hardware Acceleration: Codex-Spark runs on the Cerebras Wafer Scale Engine 3 (WSE3), a purpose-built AI accelerator designed for high-speed inference. This collaboration establishes a low-latency serving tier within OpenAI's production stack, complementing the general-purpose, cost-effective capabilities of GPUs. The WSE3 allows for efficient, parallel processing optimized for the specific architectural needs of Codex-Spark to minimize inference latency.
  2. End-to-End Latency Improvements: Beyond model and hardware specifics, OpenAI implemented comprehensive optimizations across the entire request-response pipeline. This included streamlining response streaming between client and server, rewriting critical components of the inference stack, and reworking session initialization to accelerate the appearance of the first visible token.
  3. Network and API Optimizations: A significant improvement involved the introduction of a persistent WebSocket connection, which drastically reduces connection overhead for subsequent requests. Concurrently, targeted optimizations within the Responses API were implemented. These combined efforts resulted in an 80% reduction in overhead per client/server roundtrip, a 30% reduction in per-token overhead, and a 50% reduction in time-to-first-token. The WebSocket path, enabled by default for Codex-Spark, is slated for broader deployment across all models.

In terms of capability, despite being a "small model" optimized for fast inference, Codex-Spark demonstrates strong performance on agentic software engineering benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0. It accomplishes tasks in a fraction of the time compared to its larger counterpart, GPT-5.3-Codex. Duration estimation for these benchmarks considers output generation time (outputĀ tokensĆ·samplingĀ speed\text{output tokens} \div \text{sampling speed}), prefill time (prefillĀ tokensĆ·prefillĀ speed\text{prefill tokens} \div \text{prefill speed}), total tool execution time, and total network overhead.

Codex-Spark features a 128k context window and is text-only at launch. It is rolling out as a research preview for ChatGPT Pro users via the Codex app, CLI, and VS Code extension, with separate rate limits due to its specialized hardware requirements. A limited API release is also available for design partners. OpenAI's long-term vision for Codex includes blending the real-time, interactive capabilities of Codex-Spark with the long-running, autonomous task execution strengths of larger models, allowing for flexible sub-agent delegation or parallel task distribution. Safety evaluations, including cyber-relevant training, have been conducted, with the model deemed not to exceed Preparedness Framework thresholds for high capability in cybersecurity or biology.