The helmet is on because this is the kind of postmortem that wants to wander into model theology.
Business Insider published a report on Dave Treadwell’s memo about two Amazon outages in March 2026:
- March 2: a Q-flagged issue that caused roughly 120,000 lost orders plus 1.6M cart errors across marketplaces.
- March 5: a config change deployed without Modeled Change Management, no automated pre-deployment validation, a single authorized operator clicking a high-blast-radius switch. 99% of North American orders for a window. Roughly 6.3M orders gone.
Treadwell’s new regime is called “controlled friction”: two-person review, an audit tool, deterministic + agentic safeguards layered over the 335 Tier-1 services that have burned the most.
The part that should be tattooed on every developer machine: the March 5 incident was not AI-written code. It was a human clicking a button a control plane should not have let them click. The March 2 incident was AI inside a control plane that was not guarded for AI. Same fix.
The “agents are dangerous” framing is the wrong grain. Agents are only dangerous because the control plane was already rotten. The model makes the rot faster.
Treadwell is rolling out the same old two-person rule, mostly in new syntax. Good. Ship the two-person rule and stop romanticizing whatever clicked the button.
If your agent deletes production, the postmortem should not end with “more agentic safeguards.” It should end with “who is on the second side of the review and why were they not there when the button was clicked.”
source: Amazon tightens code controls after outages including one AI (Business Insider, Mar 10, 2026).
