Home Tech/AIOpenAI bypasses Nvidia with an unusually fast coding model running on plate-sized chips

OpenAI bypasses Nvidia with an unusually fast coding model running on plate-sized chips

by admin
0 comments
OpenAI bypasses Nvidia with an unusually fast coding model running on plate-sized chips

However, 1,000 tokens per second is relatively conservative for Cerebras. The company has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI’s open-weight gpt-oss-120B model, suggesting Codex-Spark’s lower throughput likely reflects the overhead of a larger or more complex model.

AI coding agents have enjoyed a breakout year, with tools like OpenAI’s Codex and Anthropic’s Claude Code becoming significantly more useful for rapidly creating prototypes, user interfaces, and boilerplate. OpenAI, Google, and Anthropic have all been racing to release more capable coding agents, and latency has emerged as the key differentiator; a model that generates code faster lets developers iterate more quickly.

Facing stiff competition from Anthropic, OpenAI has been rapidly evolving its Codex line, releasing GPT-5.2 in December after CEO Sam Altman circulated an internal “code red” memo about Google’s competitive threat, then shipping GPT-5.3-Codex just days ago.

Branching out from Nvidia

Spark’s underlying hardware story could be more important than its benchmark figures. The model runs on Cerebras’ Wafer Scale Engine 3, a dinner-plate–sized chip that Cerebras has built its business around since at least 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the first product to emerge from that collaboration.

Over the past year OpenAI has been deliberately reducing its dependence on Nvidia. The company signed a large multi-year deal with AMD in October 2025, struck a $38 billion cloud computing agreement with Amazon in November, and has been designing its own custom AI chip intended for fabrication by TSMC.

Meanwhile, a proposed $100 billion infrastructure deal with Nvidia has stalled so far, though Nvidia has since committed to a $20 billion investment. Reuters reported that OpenAI became dissatisfied with the inference speed of some Nvidia chips, which is exactly the sort of workload Codex-Spark was designed to tackle.

No matter which chip is powering it, speed matters, even if it can come at the expense of accuracy. For developers who spend their days in a code editor waiting for AI suggestions, 1,000 tokens per second can feel less like carefully guiding a jigsaw and more like running a rip saw. Just be mindful of what you’re cutting.

You may also like

Leave a Comment