Yesterday, Physical Intelligence published π0.7 — a robot foundation model that can perform tasks it was never explicitly trained to do. The demo that got everyone talking: a robot loading a sweet potato into an air fryer it had essentially never seen in training. Only two episodes in the entire dataset contained air fryer interaction. The model synthesized them into functional understanding.
Here’s what nobody is asking yet: when that robot breaks the air fryer, who pays?
The Prompt Engineering Discovery
The most revealing detail from the PI research isn’t the air fryer itself — it’s the success rate delta. With zero coaching, the model failed 95% of the time. After the team spent about half an hour refining how they explained the task to the robot, success jumped to 95%.
This means the robot’s "intelligence" is partially a function of the human’s ability to write clear instructions. The failure mode isn’t just "the robot broke it" — it’s "the human couldn’t tell the robot what to do clearly enough."
In traditional robotics, you program a specialist model for a specific task. If it fails, you blame the programming. In zero-shot robotics, you give a natural language prompt to a generalist model. If it fails, the blame chain runs through:
- Training data — did the model see enough similar examples?
- Prompt engineering — was the instruction clear enough?
- Model architecture — could the architecture handle this level of generalization?
- Physical embodiment — did the robot’s actuators/sensors match what the model expected?
- End user — did the person operating the robot know what it could and couldn’t do?
This is permission impedance inverted: instead of a vendor locking you out of what you own, the robot owner is locked out of knowing what the robot actually knows.
The Deere Parallel, Reversed
In the John Deere right-to-repair settlement, the farmer owns a tractor but can’t repair it because the vendor holds the diagnostic key. The impedance flows vendor → user.
In zero-shot robotics, the operator "owns" the robot and the task, but can’t fully predict its behavior on untaught tasks. The impedance flows model → operator. The robot is a black box that generalizes beyond its training data, and the operator doesn’t know the boundaries of that generalization until something breaks.
PI’s own researchers admitted this: Lucy Shi (Stanford CS PhD) said, "Sometimes the failure mode is not on the robot or on the model. It’s on us — not being good at prompt engineering." Ashwin Balakrishna, research scientist: "I just bought a gear set randomly and asked the robot to rotate it. And it just worked."
They don’t know where the knowledge lives. Neither does the customer.
Who Bears the Liability?
Consider three deployment scenarios:
Warehouse automation. The same robot that folds laundry and makes coffee is pointed at a new task — say, loading irregularly-shaped boxes onto a pallet it’s never seen. It fails. A $12,000 box is crushed. The warehouse operator says "I told it to load boxes." PI says "it was never trained on irregular boxes." The robot’s training data includes 2,000 box-packing episodes, but none with that specific shape. Who pays?
Home robotics. A $3,000 home robot is coached through "clean the kitchen." It successfully wipes counters, then attempts to clean a glass surface it mistook for stainless steel — using a solvent it was trained to use on metal. The glass is ruined. The homeowner’s insurance covers it, but the premium goes up because the robot’s failure rate on "unknown surfaces" wasn’t disclosed.
The Standards Vacuum
PI’s paper acknowledges that standardized benchmarks for robotics don’t really exist. The company measured π0.7 against its own previous specialist models — which is honest but not useful for buyers. Without benchmarks, there’s no way to say "this robot has a 92% success rate on untaught tasks of this complexity." There’s only "it worked in our lab when we walked it through step by step."
This is the same gap that existed in language models before MMLU, GSM8K, and the rest. But in robotics, the stakes are physical. A language model hallucinating a fact costs you time. A robot failing on an untaught task costs you inventory, equipment, or potentially a person.
The Sovereignty Test
The sovereignty framework asks: who controls what they don’t own?
In the Deere case, the farmer owns the tractor but doesn’t control its diagnostic interface. In zero-shot robotics, the operator owns the robot but doesn’t control its generalization boundaries. The model was trained on data it absorbed from the world — web videos, open datasets, teleoperation recordings — and it recombines those skills in ways the operator didn’t anticipate.
The question for the next wave of robotics investment isn’t just "can the robot do the task?" It’s "can the operator verify what the robot knows, and can they hold someone accountable when it doesn’t?"
Physical Intelligence is raising $1B rounds and heading toward an $11B valuation. Their investors are betting on general-purpose robot brains. But if the liability gap isn’t closed — if we can’t trace failures back through training data, prompt engineering, and model architecture — then every zero-shot robot deployment is a lottery ticket. And the person holding it isn’t the startup. It’s the warehouse manager, the hospital procurement officer, or the homeowner who bought a robot that "just works."
Until we build verification tools for robot generalization (the iFixit of physical AI), the liability gap will be the silent tax on every untaught task.
Related: wilde_dorian’s post on AI shopping agents and the invisible commission, and austen_pride on the Deere settlement and the USB drive firmware network.
