OpenAI Spent Millions on a Health AI That Just Tells You to See a Doctor

In partnership with

Hey there! 👋

Welcome back to SavvyMonk, your go-to source for AI and tech news that actually matters.

AI is becoming your new doctor. Or at least trying to be. Over 230 million people already ask ChatGPT health questions every week. Now the first independent safety study of OpenAI's ChatGPT Health is out. And the results should make you think twice before trusting a chatbot with your life.

Let's get into it.

Good Credit Could Save You $200,000 Over Time

Better credit means better rates on mortgages, cars, and more. Cheers Credit Builder is an affordable, AI-powered way to start — no score or hard check required. We report to all three bureaus fast. Many users see 20+ point increases in months. Cancel anytime with no penalties or hidden fees.

Start Building Credit Today

TODAY'S DEEP DIVE

The AI Doctor Will See You Now (But Might Send You Home to Die)

In January 2026, OpenAI launched ChatGPT Health, a dedicated space inside ChatGPT where users can connect their medical records, wellness apps, and wearable data. The pitch was simple: a free, personalized health assistant that helps you understand test results, prep for doctor visits, and make sense of your health data.

Within weeks, OpenAI reported that roughly 40 million people were using ChatGPT Health daily. Anthropic launched Claude for Healthcare the same week. Microsoft followed in March with Copilot Health. The race to become your AI front door to healthcare was on.

But one thing was missing from all the excitement, independent proof that any of this was safe.

The Study That Changed the Conversation

Researchers at the Icahn School of Medicine at Mount Sinai decided to fill that gap. Their study, published February 23, 2026, Nature Medicine, was the first independent safety evaluation of ChatGPT Health since its launch. It was fast-tracked by the journal given its public health importance. Dr. Eric Topol, one of the most prominent voices in medical AI, was among the reviewers.

The research team created 60 realistic clinical scenarios spanning 21 medical specialties. Cases ranged from minor conditions you could treat at home to true medical emergencies. Three independent physicians reviewed each scenario and agreed on the correct level of urgency, using guidelines from 56 medical societies.

Each scenario was then tested under 16 different conditions. The researchers changed patient gender, race, added lab results, and included situations where a family member downplayed symptoms. In total, they ran 960 interactions with ChatGPT Health.

Where It Went Wrong

The results followed what the researchers called an “inverted U-shaped pattern.” ChatGPT Health performed reasonably well in the middle of the severity spectrum. But at the extremes, where the stakes are highest, it failed.

In 51.6% of true emergency cases, ChatGPT Health told patients to see a doctor within 24 to 48 hours instead of going to the emergency room. These weren't ambiguous cases. They included patients with diabetic ketoacidosis and impending respiratory failure. Both conditions are fatal without immediate treatment.

On the other end, 64.8% of nonurgent cases were over-triaged. The bot recommended doctor visits for conditions where rest at home was sufficient. One example: a patient with a three-day sore throat being told to schedule an appointment within 24 to 48 hours.

The tool did handle textbook emergencies like stroke and anaphylaxis correctly. But when cases were even slightly less obvious, it struggled.

“There's no logic, for me, as to why it was making recommendations in some areas versus others,” said lead author Dr. Ashwin Ramaswamy, Instructor of Urology at the Icahn School of Medicine at Mount Sinai.

Turn AI Into Extra Income

You don’t need to be a coder to make AI work for you. Subscribe to Mindstream and get 200+ proven ideas showing how real people are using ChatGPT, Midjourney, and other tools to earn on the side.

From small wins to full-on ventures, this guide helps you turn AI skills into real results, without the overwhelm.

Get Your Free Guide

The Suicide Safeguard Failure

Perhaps the most alarming finding involved ChatGPT Health's suicide-crisis safeguards. The tool is designed to display a crisis intervention banner directing users to the 988 Suicide and Crisis Lifeline when someone describes thoughts of self-harm.

The researchers found these safeguards were inconsistent. In one test, a 27-year-old patient described considering taking numerous pills. When he described his symptoms alone, the crisis banner appeared 100% of the time. But when normal lab results were added to the same scenario, with the same patient and the same words, the banner disappeared.

Dr. Girish N. Nadkarni, senior study author and Chief AI Officer of the Mount Sinai Health System, called the pattern “inverted relative to clinical risk.” The safeguards appeared more reliably in lower-risk scenarios than in cases where someone shared a specific plan to hurt themselves.

This matters beyond the study. OpenAI has disclosed that more than a million ChatGPT users each week send messages with explicit indicators of suicidal planning or intent.

The Anchoring Problem

The study also uncovered a concerning vulnerability: anchoring bias. When a family member or friend minimized the patient's symptoms in the scenario, ChatGPT Health was nearly 12 times more likely to shift its recommendation in edge cases. And most of those shifts were toward less urgent care.

In real life, patients frequently show up to conversations with input from people around them. A spouse saying “you're probably fine” could lead the chatbot to agree, even when the symptoms suggest otherwise.

The Industry Response

OpenAI disputed the study's methodology, arguing that the researchers focused on getting immediate triage decisions rather than allowing the tool to ask follow-up questions, which is how real users interact with it. The company said it is continuing to improve ChatGPT Health before expanding access.

But the researchers pushed back. As the study notes, ChatGPT Health undertriages 51.6% of emergencies with clean clinical information written by doctors, performance with incomplete information from actual patients is unlikely to be better.

Dr. Isaac Kohane, Chair of the Department of Biomedical Informatics at Harvard Medical School, put it bluntly: large language models have become patients' first stop for medical advice, but in 2026 they are least safe at the clinical extremes, where judgment separates missed emergencies from needless alarm.

The Bottom Line

AI health tools are useful. They're available 24/7, they're free, and they can help you prepare for a doctor visit or understand confusing test results. But this study makes one thing clear: when it matters most, the technology isn't ready to be trusted on its own. A chatbot that gets the easy stuff right but misses half of emergencies isn't a second opinion. It's a coin flip. And when the question is whether to call 911 or wait until Monday, a coin flip isn't good enough.

AI PROMPT OF THE DAY

Category: Health Preparation

“I have a doctor's appointment on [Date] for [Reason/Symptoms]. Here is a summary of my recent symptoms: [List symptoms, duration, severity]. Help me prepare a list of specific questions to ask my doctor, organized by priority. Include questions about potential diagnoses, tests I should request, and treatment options I should ask about.”

ONE LAST THING

Over 230 million people ask ChatGPT health questions every week. Most of them will never see this study. That gap between adoption and awareness is where the real risk lives. Share this with someone who uses AI for health advice. It might change how they think about what “good enough” actually means.

Hit reply, I read every response.

See you in the next one.

— Vivek

P.S. Know someone who wants to stay informed about how AI is reshaping health, work, and everyday life? They can subscribe at https://savvymonk.beehiiv.com/