OpenAI debuts Codex-Spark powered by Cerebras infra

OpenAI has rolled out GPT-5.3-Codex-Spark on February 12, 2026, positioning it as its first model built specifically for real-time coding inside Codex. It is a smaller sibling of GPT-5.3-Codex, tuned for near-instant code edits and rapid iteration, and served on ultra-low-latency hardware that can generate more than 1,000 tokens per second.

The rollout targets developers who want tight feedback loops: making narrow changes, reshaping logic, refining UI, and immediately seeing results without waiting on long runs. By default, Codex-Spark keeps its working style lightweight, focusing on minimal, targeted edits and not running tests unless explicitly requested. At launch, the model is text-only with a 128k context window, and it is governed by separate rate limits that do not count toward standard limits. OpenAI says users may see temporary queuing during peak demand as capacity ramps.

GPT-5.3-Codex-Spark is now in research preview.

You can just build things—faster. pic.twitter.com/85LzDOgcQj
— OpenAI (@OpenAI) February 12, 2026

Availability starts with ChatGPT Pro users via the latest Codex app, CLI, and VS Code extension, with API access limited to a small set of design partners testing product integrations. OpenAI frames this as an early-access step while it hardens the end-to-end experience and expands datacenter capacity, with broader access planned over the coming weeks.

Under the hood, OpenAI says “model speed” was only part of the problem, so it also reworked the full request-response pipeline. Changes include a persistent WebSocket path enabled by default for Codex-Spark, plus optimizations that cut per roundtrip overhead by 80%, per-token overhead by 30%, and time-to-first-token by 50%. OpenAI says this lower-latency path will become the default for other models soon.

The hardware story is the headline: Codex-Spark runs on Cerebras Wafer-Scale Engine 3 as a latency-first serving tier, marking the first milestone in the OpenAI–Cerebras partnership announced in January. Cerebras leadership describes the preview as a way to discover new usage patterns unlocked by fast inference, while OpenAI’s compute team highlights wafer-scale inference as an added capability alongside its GPU fleet for latency-sensitive workflows.

OpenAI also emphasizes safety posture: Codex-Spark inherits the same safety training as its mainline models, including cyber-relevant training, and was evaluated through its standard deployment process. OpenAI says it does not expect Codex-Spark to plausibly reach its Preparedness Framework threshold for high capability in cybersecurity or biology.

Strategically, Codex-Spark is meant to complement GPT-5.3-Codex’s longer-horizon “work for hours” mode. OpenAI’s direction is a two-mode Codex: fast, real-time collaboration for rapid iteration, and longer-horizon reasoning and execution when deeper work is needed, with a roadmap toward blending both modes in one workflow.

Source