|
摘要
|
6D object pose estimation is an essential component for robotic grasping. Most existing deep learning-based approaches focus on instance-level pose estimation, which requires prior object models and consequently limits their applicability on unseen objects in real-world scenarios. In contrast, category-level 6D pose estimation adopts Normalized Object Coordinate Space (NOCS) maps to represent intra-class object geometry, enabling pose prediction without relying on predefined object models and thus improving generalization to unseen instances. However, the original NOCS-based category-level framework typically trains NOCS prediction and object classification in a joint manner, which introduces NOCS regression error among inter-class instances with similar appearances, thereby degrading pose estimation accuracy. To address this issue, we integrate the YOLOv8 object detection with SegFormer and propose a novel Category-Level SegFormer for 6D Object Pose Estimation (CLSF-6DPE). By decoupling object classification from NOCS regression through independent learning branches, the proposed framework significantly improves pose estimation performance. Furthermore, we validate the practical feasibility of CLSF-6DPE by integrating it with a robotic gripper via the Robot Operating System (ROS) in a Real-World grasping setup. Experimental results on the CAMERA and Real-World datasets demonstrate that the proposed method achieves mAP scores of 93.8% and 81.1%, respectively. Overall, the proposed method provides a modular and effective solution for category-level pose estimation in real-world robotic grasping applications. |