
Despite the buzz about these agents being co-workers, from our experience, they’re generally more useful when treated as tools that enhance existing abilities rather than the autonomous teammates the marketing implies. They can churn out strong drafts quickly but still demand continual human correction and oversight.
The Frontier launch arrived just three days after OpenAI released a macOS desktop app for Codex, its AI coding tool, which OpenAI executives described as a “command center for agents.” The Codex app allows developers to run multiple agent threads concurrently, each operating on an isolated copy of the codebase via Git worktrees.
OpenAI also released GPT-5.3-Codex on Thursday, the new model that powers the Codex app. OpenAI says the Codex team used early versions of GPT-5.3-Codex to debug the model’s own training runs, manage deployments, and interpret test results, similar to what OpenAI told Ars Technica in December.
“Our team was amazed at how much Codex accelerated its own development,” the company wrote. On the agentic coding benchmark Terminal-Bench 2.0, GPT-5.3-Codex scored 77.3%, roughly 12 percentage points higher than Anthropic’s newly released Opus 4.6.
The common thread across these products is a change in the user’s role. Rather than simply typing a prompt and waiting for one reply, the developer or knowledge worker shifts into a supervisory role—dispatching tasks, tracking progress, and stepping in when an agent requires direction.
In that vision, developers and knowledge workers effectively become middle managers of AI: not writing the code or doing the analysis themselves, but delegating tasks, reviewing output, and hoping the agents beneath them don’t quietly break things. Whether this approach will prevail—or whether it’s a good idea—is still widely debated.