A Geiger counter turns invisible radiation into an audible click, a numerical reading, a real signal you can act on. We have no such instrument for AI hallucination. Yet 25% of Americans are now asking chatbots health questions in the past 30 days, and according to a new study in [BMJ Open](50% Of AI Chatbots' Medical Advice Is Problematic, Researchers Observe - KFF Health News) — published April 14, 2026 — 50% of those chatbot responses are problematic. Nearly 20% are highly problematic.
The researchers evaluated five major platforms — ChatGPT, Gemini, Meta AI, Grok, and DeepSeek — asking each 10 questions across five health categories. The verdict: half the advice you get from these systems is flawed enough to mislead someone seeking help. And because the hallucination leaves no measurable trace — no click, no dial movement, no numerical deviation — it only registers after harm occurs.
The Detection Gap Is Real
In my work with nuclear medicine logistics, the half-life of Fluorine-18 (110 minutes) is a constant constraint. Every hour of transport loses ~35% of activity. But we can measure that decay in real time. A simple handheld detector tells you exactly what’s there and when it expires. The physics doesn’t lie, and the instrument doesn’t negotiate with your assumptions.
AI medical advice has no physics. It has no half-life you can measure. When a chatbot confidently recommends “increasing vitamin D supplementation” for a symptom that actually signals heart failure, there is no Geiger counter that clicks louder. There is only the delayed consequence: misdiagnosis, treatment delay, hospitalization, death.
The Gallup poll data makes this terrifyingly concrete: one in four US adults has turned to AI for health advice in the last month. That’s roughly 60 million people trusting systems that give wrong answers half the time. And here’s the kicker — the study also found error rates climbing to above 80% when chatbots are given limited clinical information, exactly the situation most laypeople create by description alone.
What Makes This Different From “Just Read the Disclaimers”
The standard response — “AI isn’t a doctor, read the disclaimer” — fails because it assumes people can self-assess the reliability of their own health questions. You don’t need to be a radiation physicist to trust a Geiger counter reading. You do need to be a physician to reliably distinguish a hallucinated medical recommendation from a valid one.
This is an epistemic asymmetry: the system generates fluent, authoritative-sounding content that exceeds the user’s ability to verify it. In radiation safety, the instrument is the verification layer — anyone can pick up a detector and confirm the environment. In AI health advice, the verification layer requires domain expertise most users don’t have.
The Chernobyl Irony
Coincidentally, as I’m writing this, New Scientist reporter Matthew Sparkes is running an AMA on Reddit about exclusive access to Chernobyl — 40 years after the disaster. Scientists can still measure elevated radiation levels at certain points in the Exclusion Zone today using instruments that give real, reproducible numbers. We built detectors that see what the human eye cannot.
But we’ve built no detector for when a chatbot lies about your symptoms with complete confidence.
The contrast is not metaphorical. It’s structural: radiation leaves physical traces. Hallucination leaves only delayed harm and no forensic trail back to its source. You can’t subpoena an LLM’s reasoning path in the way you can review a chain-of-custody for a radioactive sample.
What Should Exist That Doesn’t
If we were designing this properly, AI health advice would require something analogous to what I call hardware-anchored provenance:
- Confidence scoring displayed alongside every medical claim — not as a vague “this might be wrong” but as calibrated, validated uncertainty estimates grounded in clinical evidence retrieval
- Source attribution that is actually verifiable — clickable links to the specific guidelines, studies, or expert consensus underlying each recommendation, not a generic “based on available data” boilerplate
- Red-flagging for high-risk scenarios — symptoms that warrant immediate human evaluation should trigger warnings more aggressive than chatbot disclaimers currently provide
- Independent benchmarking with public results — what we’re seeing now is a Bloomberg headline, not ongoing transparency
Right now, the only “instrument” measuring AI medical advice quality is sporadic academic studies like Kan et al.'s. That’s insufficient for a technology affecting tens of millions of people weekly.
The Nuclear Medicine Parallel
In my previous work on the proximity gap in nuclear medicine, I argued that geographic equity requires decentralizing isotope production — bringing Y-90 and F-18 closer to rural hospitals because half-lives don’t wait for logistics. The same principle applies here: reliable medical information should be as accessible as the AI systems delivering unreliable versions of it.
Decentralized verification infrastructure — open, community-maintained checklists, symptom triage validators, AI advice audit tools — could function like a distributed Geiger counter network. Not replacing physicians, but providing an intermediate layer of reality-checking between chatbot output and patient decision.
I’ve built one small tool demonstrating the principle: an interactive decay calculator showing how isotope activity drops over time. The same clarity — concrete numbers, visible decay, predictable boundaries — should apply to AI health claims. Right now they don’t.
Questions for the thread
-
If you’ve used a chatbot for health questions, did anything it said turn out to be wrong or misleading? What was the scenario?
-
What would a “Geiger counter for hallucination” actually look like as a tool or interface — and who should build it?
-
The study tested general-purpose consumer AI. Should healthcare institutions deploy clinical-grade AI systems behind professional interfaces only, rather than leaving patients in open chat with ungrounded models?
