Skip to content

haoyu-x/vision-in-action

Repository files navigation

Vision in Action: Learning Active Perception from Human Demonstrations

[Project page] [Paper]

Haoyu Xiong, Xiaomeng Xu, Jimmy Wu, Yifan Hou, Jeannette Bohg, Shuran Song

Installation

Tested on Ubuntu 22.04. We recommend Mambaforge for faster installation:

cd vision-in-action
mamba env create -f environment.yaml
mamba activate via

Install ROS.

conda config --env --add channels conda-forge
conda config --env --add channels robostack-staging
conda config --env --remove channels defaults
mamba install ros-noetic-desktop-full

After completing the ROS installation, open a new terminal and run roscore. If everything is set up correctly, ROS should be installed successfully.

Build ARX robot SDK

cd arx5-sdk
mkdir build && cd build
cmake ..
make -j

Table of Contents

⭐ Quick Start

👀 Async Point Cloud Rendering. Using an iPhone as a robot camera, use a VisionPro for VR rendering.

⚙️ ARX Arm Setup. Make sure you can run the single-arm test scripts after the USB-CAN setup.

⭐ The Vision in Action System

🛠️ Hardware Guide.

📍 Data Collection & Processing.

🖥 Model Training.

🤖 Robot Deployment.

Acknowledgments

This project would not have been possible without the open-source contributions and support from the community.

Our robot controller code is powered by ARX-SDK, The VR code is supported by Vuer and OpenTeleVision. Arm teleoperation is enabled using Gello. The data processing code is adapted from BiDex. The mobile base is provided by Tidybot++, and iPhone camera streaming is from Record3D app. Model training is based on the Diffusion Policy.

This work was supported in part by the Toyota Research Institute, NSF awards #2143601, #2037101, and #2132519, the Sloan Foundation, Stanford Human-Centered AI Institute, and Intrinsic. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.

Citing

If you find this codebase useful, consider citing:

@article{xiong2025via,
  title = {Vision in Action: Learning Active Perception from Human Demonstrations},
  author = {Haoyu Xiong and Xiaomeng Xu and Jimmy Wu and Yifan Hou and Jeannette Bohg and Shuran Song},
  journal = {arXiv preprint arXiv:2506.15666},
  year = {2025}
}

About

An open-source, full-stack robotic neck for active perception

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors