Real-world humanoid locomotion with reinforcement learning,Science Robotics

当前位置： X-MOL 学术 › Sci. Robot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Real-world humanoid locomotion with reinforcement learning
Science Robotics ( IF 25.0 ) Pub Date : 2024-04-17 , DOI: 10.1126/scirobotics.adi9579
Ilija Radosavovic ₁ , Tete Xiao ₁ , Bike Zhang ₁ , Trevor Darrell ₁ , Jitendra Malik ₁ , Koushil Sreenath ₁

Affiliation

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets. Although classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesized that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in context, without updating its weights. We trained our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deployed it to the real-world zero-shot. Our controller could walk over various outdoor terrains, was robust to external disturbances, and could adapt in context.

中文翻译：

通过强化学习实现真实世界的人形运动

可以在不同环境中自主操作的人形机器人有潜力帮助解决工厂的劳动力短缺问题、帮助家里的老年人以及殖民新的星球。尽管人形机器人的经典控制器在许多设置中显示出令人印象深刻的结果，但它们在推广和适应新环境方面具有挑战性。在这里，我们提出了一种完全基于学习的现实世界人形运动方法。我们的控制器是一个因果转换器，它将本体感受观察和动作的历史作为输入并预测下一个动作。我们假设观察-动作历史包含有关世界的有用信息，强大的 Transformer 模型可以使用这些信息在上下文中调整其行为，而无需更新其权重。我们在模拟中的随机环境集合上使用大规模无模型强化学习来训练我们的模型，并将其部署到现实世界的零样本中。我们的控制器可以在各种室外地形上行走，对外部干扰具有鲁棒性，并且可以适应环境。

更新日期：2024-04-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>