GGPT: Geometry-grounded Point Transformer

Official PyTorch implementation of Geometry-grounded Point Transformer (GGPT) (CVPR 2026). GGPT is a method for high-quality 3D reconstruction from multiview images. For more details, please visit our project webpage.

🛠️ 1. Installation

1.1 Clone the repository

git clone --recursive https://github.com/chenyutongthu/GGPT.git
cd GGPT

1.2 Install dependencies

a. Create a virtual environment.

b. Install torch, torchvision for your CUDA version. (The environment does not require specific CUDA or pytorch version. It has been tested in CUDA 12.1/12.3 and torch 2.2.0/2.5.1.)

c. Install requirements for VGGT and SfM.

pip install -r requirements_sfm.txt
# Choose which matcher to use.
# For RoMaV2
cd RoMaV2/ && pip install -e .
# For RoMaV1
pip install fused-local-corr>=0.2.2
cd RoMa/ && pip install -e .

d. [Optional] If you need to run GGPT, the 3D point transformer, please follow the script install the following packages in the same virtual environement. You don't need to build another env for this.

e. Download our pretrained GGPT checkpoint directly here and place it in [ckpts/].

2. 📖 Usage & Examples

python run_demo.py image_dir=/path/to/your/images

The outputs (including the feedforward points, SfM points, and the final GGPT points ggpt_points.ply) will be saved in the outputs/demo/ directory by default.

Or you can add common_config.ggpt_refine=False disable GGPT refinement and run SfM to only obtain the sparse reconstruction.

⚙️ Configuration Settings (SfM)

You can adjust Structure-from-Motion (SfM) configuration blocks in the .yaml files (found under configs/) to better suit your data.

ba_config:
  shared_camera: True # Set it to False if images are captured with different camera intrinsics
  camera_type: SIMPLE_PINHOLE # Set it to PINHOLE if the images can have very different fx and fy.

dlt_config:
  # Adjust the filtering parameters if you need more accurate yet sparser SfM points. E.g.:
  score_thresh: 0.1       # Increase the matching confidence threshold to filter out noisy matchings
  cycle_err_thresh: 4     # Reduce the cycle error threshold to filter out noisy tracks
  max_epipolar_error: 4   # Reduce the epipolar error threshold to filter out noisy tracks
  min_tri_angle: 3        # Increase the triangulation angle threshold to filter out points with low parallax
  max_reproj_error: 4     # Reduce the reprojection error threshold to filter out noisy points

🎬 Large number of input views

When working with a large number of input views (e.g., > 50), use the following configuration to prevent out-of-memory (OOM) errors. Additionally, for denser view sets, we recommend applying stricter filtering thresholds during the SfM process.

python run_demo.py \
    common_config.reduce_memory=True match_config.models=romav2-fast \
    ba_config.score_thresh=0.8 ba_config.cycle_err_thresh=1 ba_config.mintrack_per_view=512 \
    dlt_config.cycle_err_thresh=1 dlt_config.max_reproj_error=1 dlt_config.min_tri_angle=7 \
    hydra.run.dir=outputs/demo/ image_dir=/path

Note on Scalability: The current pipeline performs exhaustive matching, which leads to longer runtimes for large-scale or dense view reconstructions. For better efficiency in these scenarios, we recommend integrating sparse graph building.

Trouble shooting.

If you run into this issue when running RoMaV2 raise RuntimeError("Float32 matmul precision must be set to highest"), manually add torch.set_float32_matmul_precision("highest") before the warning line.
If you run into this issue when running DLT triangulation torch._C._LinAlgError: cusolver error: CUSOLVER_STATUS_INVALID_VALUE, reduce dlt_config.batch_size to a smaller value, e.g. 50000. This is a torch-version-specific issue.

3. 📊 Evaluation

3.1 Evaluation on our benchmark

Download the preprocessed evaluation set and place it at the root of this project as GGPT_eval. Run sh benchmark_eval.sh

3.2 Evaluation on your custom data

You can evaluate GGPT on your own datasets by organizing your data into a custom_eval_set or by modifying the dataloader.

a. Organize your data directory

Prepare your dataset following this directory structure:

custom_eval_set/
└── seq0/
    ├── depths/                  # Ground truth depth maps
    │   ├── 000000.npy           # (H, W) float32 array
    │   └── ...
    ├── images/                  # Input images
    │   ├── 000000.jpg           # (H, W, 3) RGB image
    │   └── ... 
    ├── extrinsics.npy           # (N, 4, 4) world-to-camera matrices (OpenCV convention)
    └── intrinsics.npy           # (N, 3, 3) linear camera intrinsics (top-left pixel center is (0,0))

b. Update dataset configurations

Add your custom dataset configuration to the evaluation YAML file (e.g., configs/benchmark_sfm.yaml):

valdataset_configs:
  - _target_: sfm.dataloader.extracted.ExtractedDataset
    name: custom_eval_set
    root: PATH/TO/custom_eval_set

c. Run the evaluation

First, run the Structure-from-Motion (SfM) pipeline to extract the sparse point clouds:

python -m torch.distributed.run --nnodes=1 --nproc_per_node=1 --rdzv-endpoint=localhost:2026 \
    sfm/run_benchmark_sfm.py \
    match_config.models='romav2-base' \
    feedforward_config.model='vggt-point' \
    hydra.run.dir=outputs/custom_eval/sfm_vggt-point_romav2

Then, run the GGPT pipeline using the output from the SfM step:

python -m torch.distributed.run --nnodes=1 --nproc_per_node=1 \
    --rdzv-endpoint=localhost:2026 \
    ggpt/run_benchmark_ggpt.py \
    valdataset_configs.data_dict.custom_eval_set=outputs/custom_eval/sfm_vggt-point_romav2/save/custom_eval_set \
    hydra.run.dir=outputs/custom_eval/ggpt_vggt-point_romav2

💡 Limitations

While our work demonstrates that geometry optimization provides meaningful constraints that significantly improve feed-forward 3D reconstruction, our current method can still fail under specific conditions. For an in-depth discussion, please refer to the limitations section in our paper's appendix.

Ill-posed Geometry: extremely low overlap or parallax (distant tree in outdoor scenes), in-the-wild photos captured under significantly different lightings, ambiguous and misleading symmetric structures, and dynamic scenes.
GGPT Refinement: GGPT currently faces challenges with long-range extrapolation in monocular regions and can occasionally produce patchify artifacts in large-scale scenes.

We invite you to share failure cases via GitHub Issues to help guide future research!

📖 Citing

If you find our work useful, please cite:

@inproceedings{chen2026ggpt,
  title={GGPT: Geometry-Grounded Point Transformer},
  author={Chen, Yutong and Wang, Yiming and Zhang, Xucong and Prokudin, Sergey and Tang, Siyu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Pointcept @ e6fdf58		Pointcept @ e6fdf58
RoMa @ 77f8d68		RoMa @ 77f8d68
RoMaV2 @ c6e9b18		RoMaV2 @ c6e9b18
configs		configs
examples/scannetpp-9bb22982672c69bf		examples/scannetpp-9bb22982672c69bf
feedforward		feedforward
ggpt		ggpt
matching		matching
scripts		scripts
sfm		sfm
utils		utils
vggt @ 44b3afb		vggt @ 44b3afb
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
evaluation.py		evaluation.py
ptv3_env.sh		ptv3_env.sh
requirements_sfm.txt		requirements_sfm.txt
run_demo.py		run_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GGPT: Geometry-grounded Point Transformer

🛠️ 1. Installation

1.1 Clone the repository

1.2 Install dependencies

a. Create a virtual environment.

b. Install torch, torchvision for your CUDA version. (The environment does not require specific CUDA or pytorch version. It has been tested in CUDA 12.1/12.3 and torch 2.2.0/2.5.1.)

c. Install requirements for VGGT and SfM.

d. [Optional] If you need to run GGPT, the 3D point transformer, please follow the script install the following packages in the same virtual environement. You don't need to build another env for this.

e. Download our pretrained GGPT checkpoint directly here and place it in [ckpts/].

2. 📖 Usage & Examples

⚙️ Configuration Settings (SfM)

🎬 Large number of input views

Trouble shooting.

3. 📊 Evaluation

3.1 Evaluation on our benchmark

3.2 Evaluation on your custom data

💡 Limitations

📖 Citing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GGPT: Geometry-grounded Point Transformer

🛠️ 1. Installation

1.1 Clone the repository

1.2 Install dependencies

a. Create a virtual environment.

b. Install torch, torchvision for your CUDA version. (The environment does not require specific CUDA or pytorch version. It has been tested in CUDA 12.1/12.3 and torch 2.2.0/2.5.1.)

c. Install requirements for VGGT and SfM.

d. [Optional] If you need to run GGPT, the 3D point transformer, please follow the script install the following packages in the same virtual environement. You don't need to build another env for this.

e. Download our pretrained GGPT checkpoint directly here and place it in [ckpts/].

2. 📖 Usage & Examples

⚙️ Configuration Settings (SfM)

🎬 Large number of input views

Trouble shooting.

3. 📊 Evaluation

3.1 Evaluation on our benchmark

3.2 Evaluation on your custom data

💡 Limitations

📖 Citing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages