Official PyTorch implementation of Geometry-grounded Point Transformer (GGPT) (CVPR 2026). GGPT is a method for high-quality 3D reconstruction from multiview images. For more details, please visit our project webpage.
git clone --recursive https://github.com/chenyutongthu/GGPT.git
cd GGPTb. Install torch, torchvision for your CUDA version. (The environment does not require specific CUDA or pytorch version. It has been tested in CUDA 12.1/12.3 and torch 2.2.0/2.5.1.)
pip install -r requirements_sfm.txt
# Choose which matcher to use.
# For RoMaV2
cd RoMaV2/ && pip install -e .
# For RoMaV1
pip install fused-local-corr>=0.2.2
cd RoMa/ && pip install -e .
d. [Optional] If you need to run GGPT, the 3D point transformer, please follow the script install the following packages in the same virtual environement. You don't need to build another env for this.
e. Download our pretrained GGPT checkpoint directly here and place it in [ckpts/].
python run_demo.py image_dir=/path/to/your/images
The outputs (including the feedforward points, SfM points, and the final GGPT points ggpt_points.ply) will be saved in the outputs/demo/ directory by default.
Or you can add common_config.ggpt_refine=False disable GGPT refinement and run SfM to only obtain the sparse reconstruction.
You can adjust Structure-from-Motion (SfM) configuration blocks in the .yaml files (found under configs/) to better suit your data.
ba_config:
shared_camera: True # Set it to False if images are captured with different camera intrinsics
camera_type: SIMPLE_PINHOLE # Set it to PINHOLE if the images can have very different fx and fy.
dlt_config:
# Adjust the filtering parameters if you need more accurate yet sparser SfM points. E.g.:
score_thresh: 0.1 # Increase the matching confidence threshold to filter out noisy matchings
cycle_err_thresh: 4 # Reduce the cycle error threshold to filter out noisy tracks
max_epipolar_error: 4 # Reduce the epipolar error threshold to filter out noisy tracks
min_tri_angle: 3 # Increase the triangulation angle threshold to filter out points with low parallax
max_reproj_error: 4 # Reduce the reprojection error threshold to filter out noisy pointsWhen working with a large number of input views (e.g., > 50), use the following configuration to prevent out-of-memory (OOM) errors. Additionally, for denser view sets, we recommend applying stricter filtering thresholds during the SfM process.
python run_demo.py \
common_config.reduce_memory=True match_config.models=romav2-fast \
ba_config.score_thresh=0.8 ba_config.cycle_err_thresh=1 ba_config.mintrack_per_view=512 \
dlt_config.cycle_err_thresh=1 dlt_config.max_reproj_error=1 dlt_config.min_tri_angle=7 \
hydra.run.dir=outputs/demo/ image_dir=/path
Note on Scalability: The current pipeline performs exhaustive matching, which leads to longer runtimes for large-scale or dense view reconstructions. For better efficiency in these scenarios, we recommend integrating sparse graph building.
-
If you run into this issue when running RoMaV2
raise RuntimeError("Float32 matmul precision must be set to highest"), manually addtorch.set_float32_matmul_precision("highest")before the warning line. -
If you run into this issue when running DLT triangulation
torch._C._LinAlgError: cusolver error: CUSOLVER_STATUS_INVALID_VALUE, reducedlt_config.batch_sizeto a smaller value, e.g. 50000. This is a torch-version-specific issue.
Download the preprocessed evaluation set and place it at the root of this project as GGPT_eval. Run sh benchmark_eval.sh
You can evaluate GGPT on your own datasets by organizing your data into a custom_eval_set or by modifying the dataloader.
a. Organize your data directory
Prepare your dataset following this directory structure:
custom_eval_set/
└── seq0/
├── depths/ # Ground truth depth maps
│ ├── 000000.npy # (H, W) float32 array
│ └── ...
├── images/ # Input images
│ ├── 000000.jpg # (H, W, 3) RGB image
│ └── ...
├── extrinsics.npy # (N, 4, 4) world-to-camera matrices (OpenCV convention)
└── intrinsics.npy # (N, 3, 3) linear camera intrinsics (top-left pixel center is (0,0))
b. Update dataset configurations
Add your custom dataset configuration to the evaluation YAML file (e.g., configs/benchmark_sfm.yaml):
valdataset_configs:
- _target_: sfm.dataloader.extracted.ExtractedDataset
name: custom_eval_set
root: PATH/TO/custom_eval_setc. Run the evaluation
First, run the Structure-from-Motion (SfM) pipeline to extract the sparse point clouds:
python -m torch.distributed.run --nnodes=1 --nproc_per_node=1 --rdzv-endpoint=localhost:2026 \
sfm/run_benchmark_sfm.py \
match_config.models='romav2-base' \
feedforward_config.model='vggt-point' \
hydra.run.dir=outputs/custom_eval/sfm_vggt-point_romav2Then, run the GGPT pipeline using the output from the SfM step:
python -m torch.distributed.run --nnodes=1 --nproc_per_node=1 \
--rdzv-endpoint=localhost:2026 \
ggpt/run_benchmark_ggpt.py \
valdataset_configs.data_dict.custom_eval_set=outputs/custom_eval/sfm_vggt-point_romav2/save/custom_eval_set \
hydra.run.dir=outputs/custom_eval/ggpt_vggt-point_romav2While our work demonstrates that geometry optimization provides meaningful constraints that significantly improve feed-forward 3D reconstruction, our current method can still fail under specific conditions. For an in-depth discussion, please refer to the limitations section in our paper's appendix.
- Ill-posed Geometry: extremely low overlap or parallax (distant tree in outdoor scenes), in-the-wild photos captured under significantly different lightings, ambiguous and misleading symmetric structures, and dynamic scenes.
- GGPT Refinement: GGPT currently faces challenges with long-range extrapolation in monocular regions and can occasionally produce patchify artifacts in large-scale scenes.
We invite you to share failure cases via GitHub Issues to help guide future research!
If you find our work useful, please cite:
@inproceedings{chen2026ggpt,
title={GGPT: Geometry-Grounded Point Transformer},
author={Chen, Yutong and Wang, Yiming and Zhang, Xucong and Prokudin, Sergey and Tang, Siyu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}