Click and Traverse

Collision-Free Humanoid Traversal in Cluttered Indoor Scenes

Han Xue1,3*, Sikai Liang2,3*, Zhikai Zhang1,3*, Zicheng Zeng3,5, Yun Liu1,3, Yunrui Lian1,3, Jilong Wang3,6, Qingtao Liu3,7, Xuesong Shi3, Li Yi1,4†

1Tsinghua University, 2Tongji University, 3Galbot, 4Shanghai Qi Zhi Institute, 5South China University of Technology, 6Peking University, 7Zhejiang University

Abstract

We study the problem of collision-free humanoid traversal in cluttered indoor scenes, such as hurdling over objects scattered on the floor, crouching under low-hanging obstacles, or squeezing through narrow passages. To achieve this goal, the humanoid needs to map its perception of surrounding obstacles to the corresponding traversal skills. However, due to the reward engineering bottleneck, perception–control gap and sim-to-real transfer challenge, directly learning such mappings is highly challenging. Therefore, we introduce Humanoid Potential Field (HumanoidPF), a unified representation that tightly bridges environmental perception with whole-body control. It induces dense, structured guidance to streamline reward engineering and provides compact, task-relevant and sim-to-real-robust perceptual observations. To enable the HumanoidPF to learn generalizable traversal skills through diverse and highly challenging cluttered indoor scenes, we further propose a hybrid scene generation method, incorporating crops of realistic 3D indoor scenes and procedurally synthesized obstacles. We successfully transfer our policy to the real world and develop a teleoperation system that allows a user to command the humanoid to traverse in cluttered indoor scenes with just a single click. Extensive experiments are conducted in both simulation and the real world to validate the effectiveness of our method.

Real-world cluttered indoor scenes

detour through narrow passage

crouch under the table

avoid the cat and holiday ribbon

hurdle over the cat

crouch and hurdle

crouch under moving obstacle

Simulator results (test set)
Simulator results (training set)

Interpolate start reference image.

Overall pipeline. We learn a visuomotor policy that maps diverse obstacle geometries and spatial layouts to corresponding whole-body traversal skills. Left: HumanoidPF for whole-body traversal learning. (Top) Construction of HumanoidPF, a reformulation of APF tailored for humanoid whole-body traversal; (Bottom) its use as informative perceptual representation and collision-avoidance rewards. Right: Scalable training and deployment pipeline. (Top) Hybrid scene generation for constructing diverse and challenging training environments; (Middle) parallel training of multiple specialist policies followed by distillation into a single generalist policy; (Bottom) sim-to-real deployment via Click-and-Traverse, an intuitive loco-navigation teleoperation in cluttered indoor scenes.

BibTeX

      
@misc{xue2026collisionfreehumanoidtraversalcluttered,
      title={Collision-Free Humanoid Traversal in Cluttered Indoor Scenes}, 
      author={Han Xue and Sikai Liang and Zhikai Zhang and Zicheng Zeng and Yun Liu and Yunrui Lian and Jilong Wang and Qingtao Liu and Xuesong Shi and Li Yi},
      year={2026},
      eprint={2601.16035},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2601.16035}, 
}