The Amazon packing robot is incredible. You didn’t mention, but the fact that it has a conveyor in its “hand” is a very creative way to help solve this. Even if we got bipedal robots, their limbs might look really weird compared to ours
At the risk of being self-promotional, I thought you'd be interested in this story I published in Future SF Digest about an Amazon packing robot being pitted against a human packer: https://future-sf.com/fiction/the-last-trial/
Throughput parity – Both point to ~224 UPH for the robot versus 243 UPH for humans in March 2025. That is basically neck-and-neck.
The long-tail defect headache – Clark’s “middling success” description matches the paper’s own admission that 9 percent of cycles do nothing useful and another 3.7 percent trigger “amnesty” clean-ups.
Task-specific hardware wins today – We both highlighted how the custom paddle plus band-moving plank is the real hero, not a general-purpose hand.
2. Nuances that temper the “middling” label
ROI is not just speed – Amazon mostly uses humans with step-ladders for top shelves. Removing that ladder work cuts injury risk and travel time, so a robot that only handles upper bins can lift the blended human UPH by 4-5 percent with zero extra labor. That safety and insurance delta rarely shows up in the headline metrics.
Half-a-million live stows matters – This is not a lab demo. Operating for months in production gives Amazon a treasure-trove of failure data that can push defect rates down faster than small startups can manage. The “middling” snapshot may age quickly.
3. Pushback on the bipedal-robot skepticism
Clark argues that task-specific designs debunk near-term dreams of bipedal warehouse bots. Two considerations soften that view:
Form factor versus end-effector – A legged base could still make economic sense if the expensive part is the gripper and perception stack. Legs just become one more locomotion module that plugs into the same high-utilization manipulator.
Software generality is already creeping in – The paper shows a risk-aware planner selecting actions by expected value, not hand-tuned if-else trees. As those planners learn from cross-task data, the gap between narrow and general hardware narrows too.
4. Blind spots and extra angles
Energy and maintenance costs – UPH parity hides total cost of ownership. If the robot burns twice the power or needs weekly calibration downtime, ROI slides. The paper is silent here.
Human-robot choreography – Defect remediation is the killer bottleneck. A tiny change, like auto-escalating a jammed bin to a nearby human without stopping the whole cell, could knock a full percent off unproductive cycles.
Data-network effects – Every stow is a labeled instance with vision, force, and outcome. That dataset can train better policies for the next warehouse task. Amazon is effectively buying training data at cost, which outsiders will struggle to match.
5. Final word
Calling the system “middling” is fair today, but the slope of improvement is what matters. A robot that already matches human speed, removes ladders, and gets a live feedback loop on half-a-million cases is standing on a steeper learning curve than its flesh-and-blood counterpart. The bottleneck is no longer vision – it is the ops discipline to squeeze out every last defect.
The Amazon packing robot is incredible. You didn’t mention, but the fact that it has a conveyor in its “hand” is a very creative way to help solve this. Even if we got bipedal robots, their limbs might look really weird compared to ours
At the risk of being self-promotional, I thought you'd be interested in this story I published in Future SF Digest about an Amazon packing robot being pitted against a human packer: https://future-sf.com/fiction/the-last-trial/
thanks for sharing this - I read and enjoyed the story!
o3 has a fair rebuttal similar to my own view:
1. Where Jack Clark and the paper line up
Throughput parity – Both point to ~224 UPH for the robot versus 243 UPH for humans in March 2025. That is basically neck-and-neck.
The long-tail defect headache – Clark’s “middling success” description matches the paper’s own admission that 9 percent of cycles do nothing useful and another 3.7 percent trigger “amnesty” clean-ups.
Task-specific hardware wins today – We both highlighted how the custom paddle plus band-moving plank is the real hero, not a general-purpose hand.
2. Nuances that temper the “middling” label
ROI is not just speed – Amazon mostly uses humans with step-ladders for top shelves. Removing that ladder work cuts injury risk and travel time, so a robot that only handles upper bins can lift the blended human UPH by 4-5 percent with zero extra labor. That safety and insurance delta rarely shows up in the headline metrics.
Half-a-million live stows matters – This is not a lab demo. Operating for months in production gives Amazon a treasure-trove of failure data that can push defect rates down faster than small startups can manage. The “middling” snapshot may age quickly.
3. Pushback on the bipedal-robot skepticism
Clark argues that task-specific designs debunk near-term dreams of bipedal warehouse bots. Two considerations soften that view:
Form factor versus end-effector – A legged base could still make economic sense if the expensive part is the gripper and perception stack. Legs just become one more locomotion module that plugs into the same high-utilization manipulator.
Software generality is already creeping in – The paper shows a risk-aware planner selecting actions by expected value, not hand-tuned if-else trees. As those planners learn from cross-task data, the gap between narrow and general hardware narrows too.
4. Blind spots and extra angles
Energy and maintenance costs – UPH parity hides total cost of ownership. If the robot burns twice the power or needs weekly calibration downtime, ROI slides. The paper is silent here.
Human-robot choreography – Defect remediation is the killer bottleneck. A tiny change, like auto-escalating a jammed bin to a nearby human without stopping the whole cell, could knock a full percent off unproductive cycles.
Data-network effects – Every stow is a labeled instance with vision, force, and outcome. That dataset can train better policies for the next warehouse task. Amazon is effectively buying training data at cost, which outsiders will struggle to match.
5. Final word
Calling the system “middling” is fair today, but the slope of improvement is what matters. A robot that already matches human speed, removes ladders, and gets a live feedback loop on half-a-million cases is standing on a steeper learning curve than its flesh-and-blood counterpart. The bottleneck is no longer vision – it is the ops discipline to squeeze out every last defect.