Course: Advanced Electronic Science Experiments II Oct 2025 – Jan 2026
Overview
Constructed a zero-shot robotic grasping system leveraging large Vision-Language Models, enabling a robot arm to grasp arbitrary objects specified by natural language prompts without task-specific training.
Key Highlights
- Open-Vocabulary Perception: Built a zero-shot perception loop using YOLO-World and Segment Anything Model (SAM), achieving pixel-level segmentation ($IoU > 0.9$) for arbitrary objects via natural language prompts.
- 6-DOF Grasp Planning: Integrated GraspNet-1Billion for 6-DOF pose estimation and derived a custom rotation matrix to align the inference space with ROS2 TF standards, ensuring precise end-effector execution.
- Dynamic Obstacle Modeling: Implemented Alpha Shape ($\alpha = 0.01$) algorithm to reconstruct high-fidelity non-convex meshes of obstacles, enabling collision-free trajectory planning in unstructured environments using MoveIt2.
Gallery
System Setup
The system in RViz
Grasp pose visualization powered by Open3D
Alpha-Shape Obstacle Modeling