Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions

Learning Environment-Aware Affordance
for 3D Articulated Object Manipulation under Occlusions

Kai Cheng^2* Ruihai Wu^{1, 4*} Yan Shen^{1, 4} Chuanruo Ning² Guanqi Zhan³ Hao Dong^{1, 4}
(*: indicates joint first authors, order determined by coin flip)

¹ CFCS, School of CS, PKU ² School of EECS, PKU ³ University of Oxford
⁴ National Key Laboratory for Multimedia Information Processing, School of CS, PKU

37th Conference on Neural Information Processing Systems (NeurIPS 2023)

[Paper] [Code] [BibTex]

Abstract

Perceiving and manipulating 3D articulated objects in diverse environments is essential for home-assistant robots. Recent studies have shown that point-level affordance provides actionable priors for downstream manipulation tasks. However, existing works primarily focus on single-object scenarios with homogeneous agents, overlooking the realistic constraints imposed by the environment and the agent's morphology, e.g., occlusions and physical limitations. In this paper, we propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints. Unlike object-centric affordance approaches, learning environment-aware affordance faces the challenge of combinatorial explosion due to the complexity of various occlusions, characterized by their quantities, geometries, positions and poses. To address this and enhance data efficiency, we introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations. Experiments demonstrate the effectiveness of our proposed approach in learning affordance considering environment constraints.

Video Presentation

Environment-Aware Affordance under Occlusions

Figure 1. Considering constraints imposed by the environment and robot for object-centric affordance introduces combinatorial explosion challenge in complexity. Given that target manipulation point (Red Point) conditioned occluder parts that impact the manipulation usually have limited local areas (Red Box), our model can be trained on one-occluder scenes (Train) and generalize to multiple occluder combinations (Test, Affordance). The learned affordance provides actionable information for articulated object manipulation (Manipulation). Our model also predicts reasonable affordance on real-world scanned point clouds (Real-world Scan).

Data-Efficient Framework for Learning Affordance

Figure 2. Our model takes scene point cloud and robot position as input. Our framework generates per-point occlusion fields indicating most significant local parts of occluders. Then, it predicts per-point affordance using extracted features of each target point, its corresponding occlusion field, and robot position. The trained model can generalize to novel multi-occluder scenes. Red points denote manipulation points over the target object. We use contrastive learning to better learn scene representations.

Qualitative Results of Affordance Prediction

Figure 3. Comparisons and ablations. (a) The per-point action scores predicted by vanilla Where2Act (W2A), Where2Act with robot (W2A-R), Object-Object Affordance (O2O) and our model. (b) The per-point action scores predicted by the ablated versions that removes Occlusion Field (OF) and Contrastive Learning (CL) respectively and our model.

Figure 4. Continuity of our occlusion fields and affordance predictions. (a) The occlusion fields are continuously conditioned on key occluder parts. (b) The predicted affordance maps are continuously conditioned on robot positions (robot body is larger than notation).

Real-World Results

Figure 5. We present some promising results by directly testing our model on real-world scans. Our model not only learns to handle occlusion constraints but also learns to avoid manipulating unreasonable areas which may cause collisions (Red Circles).

Figure 6. Real-world comparison of manipulation policies guided by object-centric affordance and our method.

Citation

@inproceedings{cheng2023learning,
      title={Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions},
      author={Cheng, Kai and Wu, Ruihai and Shen, Yan and Ning, Chuanruo and Zhan, Guanqi and Dong, Hao},
      booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
      year={2023}
    }

Contact

If you have any questions, please feel free to contact Kai Cheng and Ruihai Wu.