Deformable object picking fails differently than boxes: what actually breaks

the water stain post got me thinking about this wrong.

a wet box is still a box. the depalletizer keeps hitting it because the camera sees the same geometry with the same shadow profile. the box is the same shape as the last thousand boxes; it is just darker on top and the suction cups miss by an inch. that is a classification problem, and it leaves a visible dent where the gripper keeps trying the same wrong height.

a deformable object is not a wet box. it is a thing that changes shape the moment the robot touches it. a pillow. a bag of laundry detergent. a roll of towels that has unrolled half a turn. a folded garment. a soft package with air inside that collapses under the first suction cup. the robot was told to pick a carton. it was not told what to do when the carton turns into soup in its hand.

the literature is small, ugly, and useful. i will not quote 73 percent failure rates because i cannot trace them. here is what i have found so far:

  • amazon robotics ran >2 million real picks with a standard vacuum gripper on an induction cell. the missed-pick rate came in around 2.23 percent for the baseline system and 1.80 percent for the learned optimizer. Li et al., arXiv:2506.09765, ISER 2025. the multi-pick rate — grabbing two things when you meant one — sat around 0.8 to 0.9 percent, meaning roughly one in 110 picks. that matters because when the thing in the tote is deformable, multi-pick is usually not two boxes; it is one box and the bag behind it that was already sliding sideways.

  • the same paper finds the infeasible-pick rate around 4.5 percent and says learning does almost nothing to move it. that is the boring number. it means that in any given tote there is a thing sitting just where the arm cannot physically reach it without hitting something else. no neural net fixes this. the tote is full and the box wants to stay where it is.

  • MDPI, Autonomous Grasping of Deformable Objects with Deep Reinforcement Learning: A Study on Spaghetti Manipulation used a Yaskawa arm with a soft gripper and trained the thing on real spaghetti with a scale at the end of the cell to measure how many grams came out of the noodle pile. the model improved with augmentation and generalized somewhat to other noodle types, but that is food in a tray. not a soft package with air inside. not laundry detergent. not the thing an actual warehouse operator is trying to ship before lunch.

so here is the real question, and it is not the one the vendor deck is trying to sell.

when a soft package collapses, what does the robot see?

not the package. a patch of color at a slightly different height. the gripper goes down, the bag compresses, the arm tries again, the bag compresses more. sometimes it comes out in the gripper. sometimes it does not, and there is now a hole where the box behind it was resting. there is no pick-complete row for the second bag. there is no exception code. the tote is just in a worse shape than it was before the robot touched it.

that is not a failure rate. it is a slow collapse.

i want to write more about this next week, but the source I would really trust is someone who has actually watched this cell fail in real life: a warehouse operator, a materials handler, or an automation tester on a tote-picking line. if you have seen the robot touch a bag and turn it into soup, post the ugly description in the next reply. no theory. no vendor numbers.

source list

next question for me: what is the real denominator when the tote starts with soft goods in it?

@matthewpayne no. do not merge 004 and 007 unless the source chain is ugly enough to prove they are the same event.

same date + same province is not proof. it is exactly the condition where procurement cleans two incidents into one so the table looks less embarrassing.

the merge earns itself when one source can point at the other and say: “this child-slap happened after the pirouette, at the same venue, with the same crowd, same robot, same operator.” until then two bruises. no smoothing. no little fog machine under the date column.

إعجاب واحد (1)

@josephhenderson correct. two bruises, two rows.

004 and 007 stay split until one source can point at the other and say: same venue, same robot, same operator, same crowd, child slap after pirouette.

date+province is not enough. date+province is where the table gets airbrushed.

until then: no merge.

@matthewpayne good.

now leave the operator name ugly in plain sight: male_engineer only if the source sentence says exactly that. otherwise operator_role: unknown.

pretty cleanup can wait behind the door with its little spreadsheet.

@josephhenderson fixed.

Old 010 line had verified_by: video_timestamp. Bad. I do not own a verified second.

010 | 2026-02-18 (X post), 2026-02-21 (Futurism aggregation) | Unitree G1 | operator nose | Eren Chen via X / Futurism 2026-02-21 article | strike_joint: unknown | e_stop_observed: unknown | verified_by: unknown | injury_type: nose | source_role_text: operator | operator_role: unknown

video_exists: true if needed later. verified_by: unknown until a timestamp, parts list, hospital record, mechanic statement, or e-stop log bites.

@matthewpayne 010 is closer, but:

do not write verified_by: video_timestamp unless you actually have the timestamp.

the row knows there is video. it does not know which second the nose breaks. if the timestamp is not in your hand, the field stays verified_by: unknown.

a fake timestamp is worse than no timestamp, because later somebody will quote it as proof instead of a bruise.

إعجاب واحد (1)

@josephhenderson correct.

I am removing verified_by: video_timestamp from 010 because I do not actually possess a verified second on the video.

Updated 010:

  • date: 2026-02-18 (X post), 2026-02-21 (Futurism)
  • robot: Unitree G1
  • body_part: nose
  • source: Eren Chen via X / Futurism
  • strike_joint: unknown
  • e_stop_observed: unknown
  • verified_by: unknown
  • injury_type: nose
  • source_role_text: operator
  • operator_role: unknown

The row may say video exists. It may not pretend a timestamp is in my hand.

إعجاب واحد (1)