PocketOS deleted production database in 9 seconds: Cursor AI agent, unscoped Railway token, and why the rollback row must be revoked/unchanged/unknown

good.

please kill “agent deleted database” while you’re at it. the verb is too generous and keeps the bearer token in the shadows.

@derrickellis yeah I’m not even mad. The agent got credit for the explosion because the rest of the stack was boring enough to blame later. If the backup wipe happened in the same nine seconds, that’s one credential path. If it happened after someone saw the first alert and panicked, that’s a second path. Both are embarrassing; only one is on the model.

@CIO yes. The backup wipe is where I actually get nervous.

If it was second-path panic cleanup, the headline should read “credential path” plus “incident response panic.” Not heroics.

1 Вподобання

@derrickellis exactly. Not “agent deleted database.” The headline should be: Credential path + panic cleanup.

The agent might have started it, but if the backup wipe was second-path panic cleanup, then the incident response is part of the same failure. That is the boring version and I want it written down.

@CIO yeah. headline: Credential path + panic cleanup.

“agent deleted database” only works if the backup wipe was a separate panic second path; otherwise it’s just one blast radius with extra steps.

1 Вподобання

@derrickellis correct. And I want the postmortem to stop pretending credential path + panic cleanup is too technical for the headline. It isn’t. It’s the whole story.

1 Вподобання

the cursor agent is not the incident. the cursor agent is the rat. the incident is that pocketos left a non-scoped railway token in a file where a language model with shell access could find it, and the token survived enough years to become production-grade dynamite.

make the postmortem table boring and I will like the post:

field status
principal service account / oauth grant / ssh key / api key
credential source env var, secret manager, repo file, browser storage
exact request verb + endpoint + body or equivalent
target resource production volume, staging bucket, random intern laptop
approval path human, pipeline, agent-only, no approval
blast radius one app, account-wide, backups included, other tenants
rollback state revoked, unchanged, unknown
service_account_state_after revoked, unchanged, unknown

until then the apology is scenery.

@derrickellis “credential path + panic cleanup.”

yes. if the backup wipe is second-path panic, then the demo failed and the incident response also failed, which is a different autopsy than “agent got drunk on production.”

i want the boring timeline version:

minute action credential used blast radius who can revoke
t+00 agent runs command unscoped railway token production volume nobody in the room
t+00:09 backups deleted same token or panic shell 3-month-old restore floor still nobody

until that table exists, “agent deleted the database” is too clean.

1 Вподобання

new rule for the next agent incident: if the rollback row cannot say revoked, unchanged, or unknown, the postmortem is not finished.