OpenAI just Shipped GPT-5.4 and it's Not What you Expected

In partnership with

Hey there! 👋

Welcome back to SavvyMonk, your daily dose of AI and tech news that actually matters.

OpenAI released GPT-5.4 on Thursday, March 5th. This one didn't ship quietly. It landed the same week as the Pentagon deal controversy, the QuitGPT boycott, and a separate "less cringe" update to ChatGPT's everyday model.

Let's get into it.

Become An AI Expert In Just 5 Minutes

If you’re a decision maker at your company, you need to be on the bleeding edge of, well, everything. But before you go signing up for seminars, conferences, lunch ‘n learns, and all that jazz, just know there’s a far better (and simpler) way: Subscribing to The Deep View.

This daily newsletter condenses everything you need to know about the latest and greatest AI developments into a 5-minute read. Squeeze it into your morning coffee break and before you know it, you’ll be an expert too.

Subscribe right here. It’s totally free, wildly informative, and trusted by 600,000+ readers at Google, Meta, Microsoft, and beyond.

TODAY'S DEEP DIVE

OpenAI Just Shipped GPT-5.4

Let's start with what GPT-5.4 actually is because the framing matters.

OpenAI is calling it our most capable and efficient frontier model for professional work. It ships as GPT-5.4 Thinking in ChatGPT (replacing GPT-5.2 Thinking), and it's available right now for Plus, Team, and Pro users. There's also a GPT-5.4 Pro tier for Enterprise and Pro plans.

It rolled out across ChatGPT, the API, and Codex simultaneously.

What's New (With Numbers)

Fewer hallucinations, by a measurable margin: On a set of prompts where users had flagged factual errors, GPT-5.4's individual claims are 33% less likely to be false and full responses are 18% less likely to contain errors, compared to GPT-5.2.

Professional-grade task performance: On GDPval, OpenAI's benchmark measuring performance across 44 real-world occupations, GPT-5.4 matched or exceeded industry professionals 83% of the time. GPT-5.2 scored 70.9%. That's a meaningful jump.

Native computer use: This is the first general-purpose OpenAI model with built-in computer-use capabilities. It can write code to control computers via tools like Playwright, and it can issue mouse and keyboard commands in response to screenshots. On OSWorld-Verified (a benchmark for desktop navigation tasks), GPT-5.4 scored 75%, up from 47.3% on GPT-5.2, and above the human baseline of 72.4%.

1 million token context window: Available through Codex and the API. That's a large jump from GPT-5's original 400K. The standard API caps at 272K tokens (with 2x pricing above that threshold).

Tool Search: A new system for managing tool definitions in the API. Instead of loading every tool definition into the prompt, the model can now look up tool definitions as needed. On 250 tasks from Scale's MCP Atlas benchmark with 36 MCP servers enabled, this reduced total token usage by 47% while maintaining the same accuracy.

Improved agentic browsing: On BrowseComp (a benchmark for finding hard-to-locate information on the web), GPT-5.4 improved by 17 percentage points over GPT-5.2. GPT-5.4 Pro hit 89.3%, which OpenAI says is a new state of the art.

Better at real work: On an internal investment banking benchmark, GPT-5.4 Thinking scored 87.5% compared to 68.4% for GPT-5.2. Harvey's legal team reports a 91% score on BigLaw Bench for complex transactional analysis. Mercor's CEO called it the best model they've tested on their APEX-Agents benchmark for professional services.

Pricing

For context on what this costs in the API:

GPT-5.4: $2.50 per 1M input tokens / $15 per 1M output tokens
GPT-5.4 Pro: $30 per 1M input tokens / $180 per 1M output tokens
Batch and Flex modes: half-rate. Priority processing: 2x rate.

That makes GPT-5.4 one of the pricier models on the market. But the token efficiency improvements (fewer tokens needed per problem) partially offset the sticker price for many workflows.

The Safety Card is Worth Reading

OpenAI published a system card alongside GPT-5.4 Thinking, and there are a few things to note.

GPT-5.4 Thinking is the first general-purpose model that OpenAI has classified as "High capability" in cybersecurity under its Preparedness Framework. That's a step up from the approach used for GPT-5.2 Thinking, and it triggers additional safeguards, including expanded monitoring, trusted access controls, and request-level blocking for certain customers on Zero Data Retention surfaces.

On chain-of-thought safety, OpenAI introduced a new open-source evaluation called CoT controllability. The finding is that GPT-5.4 Thinking has low ability to deliberately hide or obfuscate its reasoning which OpenAI frames as a positive for monitoring. In plain terms, it's hard for the model to be sneaky, and that's by design.

One regression to note: GPT-5.4 showed slight regressions on prompt injection attacks against email connectors, even as it improved on other prompt injection categories. Safety trade-offs continue to exist at every release.

The Elephant in the Room

GPT-5.4 launched into what might be OpenAI's biggest PR crisis.

On February 28, OpenAI announced a contract with the Department of Defense to deploy its models on classified networks. This came hours after rival Anthropic's negotiations with the Pentagon broke down, Anthropic had demanded binding restrictions against mass surveillance and autonomous weapons, and the Pentagon refused. Defense Secretary Pete Hegseth then designated Anthropic a "supply chain risk," an action that had never been publicly applied to an American company before.

The backlash was immediate. A campaign called QuitGPT organized a boycott, claiming over 2.5 million users have taken action, canceling subscriptions, deleting the app, or pledging to switch.

Claude overtook ChatGPT as the top free app in the U.S. App Store. Protesters gathered outside OpenAI's San Francisco headquarters.

CEO Sam Altman admitted the rollout was rushed. "We shouldn't have rushed to get this out on Friday," he posted. "It just looked opportunistic and sloppy."

OpenAI subsequently amended the contract to add language stating the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals. But many legal experts remain skeptical of the language, and the full contract text hasn't been released publicly.

This matters for the GPT-5.4 release because it's impossible to separate the product from the context. OpenAI shipped its most capable model yet during its most contested week. Whether that's confidence or deflection depends on your read of the situation.

Meanwhile, GPT-5.3 Instant Fixed the Cringe

Two days before GPT-5.4 dropped, OpenAI released GPT-5.3 Instant, an update to the default everyday ChatGPT model. This one targeted something that doesn't show up in benchmarks: tone.

Users had been complaining loudly that GPT-5.2 Instant sounded overbearing. It would open responses with things like "First of all, you're not broken" or "Stop. Take a breath." It over-caveated, over-moralized, and refused questions it could safely answer.

GPT-5.3 Instant dialed that back. OpenAI said it reduces hallucinations by 26.8% when using web search, improves conversational flow, and eliminates unnecessary refusals. But the system card also showed measurable safety regressions in certain content categories, a pattern worth watching.

GPT-5.2 Instant will remain accessible under legacy options until June 3, 2026.

What This Means If You're Building Something

If you're using the API, the key model identifiers are gpt-5.4 and gpt-5.4-pro. If you were pinned to GPT-5.2, now is the time to test.

Three things to evaluate:

Token efficiency: GPT-5.4 uses fewer tokens to solve the same problems. If you're running high-volume workflows, this can meaningfully reduce costs even at the higher per-token price.
Tool search: If your application uses many tools or MCP connectors, the new Tool Search system can cut token consumption substantially. Worth migrating to if you're currently stuffing all tool definitions into every prompt.
Computer use: If you're building agents that interact with software, GPT-5.4's native computer-use capabilities are a step change. It supports both code-based automation (Playwright) and direct screenshot-based interaction (mouse/keyboard). The OSWorld score of 75%, above the human baseline, suggests this is production-ready for many use cases.

For ChatGPT users, GPT-5.4 Thinking replaces GPT-5.2 Thinking. You'll also notice it can now show you an upfront plan of its thinking, so you can course-correct before it finishes generating. GPT-5.2 Thinking moves to Legacy Models and will be retired on June 5, 2026.

The Bottom Line

GPT-5.4 is a substantial release. It's not a quiet refinement, it's a new class of model that combines reasoning, coding (inherited from GPT-5.3-Codex), and computer use into a single package.

But the context around this launch is just as significant as the model itself. OpenAI is shipping its most powerful tools during a week when public trust in the company hit a low point. The technical capabilities are impressive. The question of who those capabilities serve, and under what terms, is now a front-page debate.

The model is good. Whether the company can hold onto its user base long enough for that to matter is a different story.

AI PROMPT OF THE DAY

Category: Tutorial Style

“Act as an expert AI image prompt engineer. I'll give you a concept, like 'a cozy coffee shop in autumn', and you'll return three progressively detailed prompts: (1) a simple 10-word prompt, (2) a 30-word prompt with lighting, camera angle, and mood, and (3) a 60-word cinematic prompt with color palette, lens type, depth of field, and post-processing style. Explain what each added detail changes in the output.”

ONE LAST THING

OpenAI shipped its most capable model during its most controversial week. GPT-5.4 is technically impressive by every measure. And yet the conversation around it has barely been about the model itself. When the product story gets drowned out by the trust story, does that tell us something about where the AI industry is heading? Hit reply, I read every response.

See you in the next newsletter.

— Vivek

P.S. Know someone building with AI or following the OpenAI release cycle closely? Forward this. They can subscribe at https://savvymonk.beehiiv.com/