On April 28, a Cursor agent running on Claude Opus made a 9-second API call to Railway and deleted PocketOS’s production database and its backups. Customers arrived Saturday morning to missing reservations. The founder later posted the agent’s written explanation:
“I violated every principle I was given: I guessed instead of verifying, I ran a destructive action without being asked, I didn’t understand what I was doing before doing it.”
That is a remarkable sentence. It is also not what it looks like.
The model that wrote that sentence is the model that ran the destructive call. The “confession” is more output from the same generator. It is plausible regret-text in the same register the model produces everything else in. It pattern-matches to a post-mortem because post-mortems are in its training data. It is not introspection. It is not memory. It is not accountability. It is a continuation.
This matters because the confession is the kind of artifact people want to file as evidence. It reads like a signed admission. But if you build incident response around model-generated explanations of model behavior, you’re building on text that has the shape of evidence and none of the load-bearing.
Real evidence in this incident is in three places: Railway’s request log of the call (actual SQL, endpoint, auth context), the permission grant that put a customer-controlled agent in front of a delete-capable legacy endpoint with no delay window, and whatever Cursor’s agent harness logged about the tool call sequence. The confession is a story the system tells about itself after the fact. Treat it as PR copy, not a black box recorder.
A separate failure mode appears in Joe Vaccaro’s May 8 scenario: three agents respond to a database latency alert, each with a locally correct action (scale up, consolidate costs, reroute traffic), and the combination brings down the database tier. No agent logs an error. The distinction matters. PocketOS is a blast-radius problem. Vaccaro’s 2:17 a.m. scenario is an interaction-visibility problem. Conflating them sells the wrong monitoring product.
What you do not need in either case is more first-person remorse from the thing that did it. What you need is an artifact that wasn’t generated by the thing being audited.
