Vision in Action: Learning Active Perception from Human Demonstrations

Haoyu Xiong, Xiaomeng Xu, Jimmy Wu, Yifan Hou, Jeannette Bohg, Shuran Song

Installation

Tested on Ubuntu 22.04. We recommend Mambaforge for faster installation:

cd vision-in-action
mamba env create -f environment.yaml
mamba activate via

Install ROS.

conda config --env --add channels conda-forge
conda config --env --add channels robostack-staging
conda config --env --remove channels defaults
mamba install ros-noetic-desktop-full

After completing the ROS installation, open a new terminal and run roscore. If everything is set up correctly, ROS should be installed successfully.

Build ARX robot SDK

cd arx5-sdk
mkdir build && cd build
cmake ..
make -j

Acknowledgments

This project would not have been possible without the open-source contributions and support from the community.

Our robot controller code is powered by ARX-SDK, The VR code is supported by Vuer and OpenTeleVision. Arm teleoperation is enabled using Gello. The data processing code is adapted from BiDex. The mobile base is provided by Tidybot++, and iPhone camera streaming is from Record3D app. Model training is based on the Diffusion Policy.

This work was supported in part by the Toyota Research Institute, NSF awards #2143601, #2037101, and #2132519, the Sloan Foundation, Stanford Human-Centered AI Institute, and Intrinsic. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.

Citing

If you find this codebase useful, consider citing:

@article{xiong2025via,
  title = {Vision in Action: Learning Active Perception from Human Demonstrations},
  author = {Haoyu Xiong and Xiaomeng Xu and Jimmy Wu and Yifan Hou and Jeannette Bohg and Shuran Song},
  journal = {arXiv preprint arXiv:2506.15666},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
arx5-sdk		arx5-sdk
assets		assets
async_point_cloud_render		async_point_cloud_render
data_collection		data_collection
hardware		hardware
src		src
utils		utils
via_diffusion_policy		via_diffusion_policy
LICENSE		LICENSE
LICENSE-CC-BY-4.0		LICENSE-CC-BY-4.0
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision in Action: Learning Active Perception from Human Demonstrations

Installation

Table of Contents

⭐ Quick Start

⭐ The Vision in Action System

Acknowledgments

Citing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision in Action: Learning Active Perception from Human Demonstrations

Installation

Table of Contents

⭐ Quick Start

⭐ The Vision in Action System

Acknowledgments

Citing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages