Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Key Points
- 1Mobile ALOHA presents a novel system for learning complex bimanual mobile manipulation tasks that necessitate whole-body control, extending beyond typical tabletop robotics.
- 2This system is built upon a low-cost, whole-body teleoperation platform, which augments the ALOHA system with a mobile base and an enhanced teleoperation interface for data collection.
- 3By employing supervised behavior cloning and co-training with existing datasets, Mobile ALOHA significantly boosts success rates (up to 90%), enabling autonomous execution of intricate mobile manipulation tasks like cooking or organizing.
Mobile ALOHA presents a novel system for enabling bimanual mobile manipulation through low-cost whole-body teleoperation and imitation learning. The core problem addressed is the limitation of traditional robotic manipulation, which often focuses on tabletop tasks, lacking the mobility and dexterity required for more generalized real-world scenarios.
The proposed solution, Mobile ALOHA, augments the existing ALOHA teleoperation system by integrating a mobile base and a sophisticated whole-body teleoperation interface. This augmentation allows for the collection of rich human demonstration data that encompasses both dexterous bimanual manipulation and full-body mobility. The system is designed to be low-cost, making it accessible for broader research and application.
For learning autonomous mobile manipulation skills, the methodology employs imitation learning from human demonstrations. Specifically, supervised behavior cloning is utilized. In this approach, a policy is learned to map observed states to robot actions by training on expert demonstrations. The training objective for behavior cloning typically involves minimizing a loss function, such as the mean squared error (MSE) between the policy's predicted actions and the expert's recorded actions: , where is the dataset of expert state-action pairs.
A key technical contribution is the finding that "co-training" with existing static ALOHA datasets significantly boosts performance on mobile manipulation tasks. Co-training implies leveraging a larger, pre-existing dataset of demonstrations for static, bimanual tasks alongside the newly collected mobile manipulation data. This combined training strategy, using 50 demonstrations per task, resulted in a substantial increase in success rates, up to 90%. This indicates that leveraging knowledge from more constrained, yet relevant, manipulation data can generalize effectively to more complex mobile scenarios, likely by providing a broader understanding of dexterous manipulation primitives.
Mobile ALOHA successfully demonstrates the ability to autonomously complete a range of complex tasks requiring both mobility and bimanual dexterity, including "sauteing and serving a piece of shrimp," "opening a two-door wall cabinet to store heavy cooking pots," "calling and entering an elevator," and "lightly rinsing a used pan using a kitchen faucet." The paper also acknowledges the iterative development process, mentioning the occurrence of "funny robot failures" during the hardware and software/AI co-development journey, highlighting the practical challenges of robotic system development.