Hao Yu* •
Haotong Lin* •
Jiawei Wang* •
Jiaxin Li •
Yida Wang •
Xueyang Zhang •
Yue Wang
Xiaowei Zhou •
Ruizhen Hu •
Sida Peng
[2026-04] 🎉 Training and evaluation code of InfiniDepth (RGB Only & Depth Sensor Augmentation) is available now!
[2026-03] 🎉 Inference code of InfiniDepth (RGB Only & Depth Sensor Augmentation) is available now!
[2026-02] 🎉 InfiniDepth has been accepted to CVPR 2026! Code coming soon!
InfiniDepth supports three practical capabilities for single-image 3D perception and reconstruction:
| Capability | Input | Output |
|---|---|---|
| Monocular & Arbitrary-Resolution Depth Estimation | RGB Image | Arbitrary-Resolution Depth Map |
| Monocular View Synthesis | RGB Image | 3D Gaussian Splatting (3DGS) |
| Depth Sensor Augmentation (Monocular Metric Depth Estimation) | RGB Image + Depth Sensor | Metric Depth + 3D Gaussian Splatting (3DGS) |
Please see INSTALL.md for manual installation.
If you want to test InfiniDepth before running local CLI inference, start with the hosted demo:
- Hugging Face Space: https://huggingface.co/spaces/ritianyu/InfiniDepth
This repo also includes a Gradio Space entrypoint at app.py:
- Input: RGB image (required), depth map (optional)
- Task Switch:
Depth/3DGS - Model Switch:
InfiniDepth/InfiniDepth_DepthSensor
python app.py- In this demo,
InfiniDepth_DepthSensorrequires a depth map input; RGB-only inference should useInfiniDepth. - Supported depth formats in the demo upload:
.png,.npy,.npz,.h5,.hdf5,.exr.
| If you want ... | Recommended command |
|---|---|
| Relative Depth from Single RGB Image | bash example_scripts/infer_depth/courtyard_infinidepth.sh |
| 3D Gaussian from Single RGB Image | bash example_scripts/infer_gs/courtyard_infinidepth_gs.sh |
| Metric Depth from RGB + Depth Sensor | bash example_scripts/infer_depth/eth3d_infinidepth_depthsensor.sh |
| 3D Gaussian from RGB + Depth Sensor | bash example_scripts/infer_gs/eth3d_infinidepth_depthsensor_gs.sh |
| Multi-View / Video Depth + Global Point Cloud | bash example_scripts/infer_depth/waymo_multi_view_infinidepth.sh |
1. Relative Depth from Single RGB Image (inference_depth.py)
Use this when you want a relative depth map from a single RGB image and, optionally, a point cloud export.
Required input
RGB image
Required checkpoints
checkpoints/depth/infinidepth.ckptcheckpoints/moge-2-vitl-normal/model.ptrecover metric scale for point cloud export
Optional checkpoint
checkpoints/sky/skyseg.onnxadditional sky filtering
Recommended command
python inference_depth.py \
--input_image_path=example_data/image/courtyard.jpg \
--model_type=InfiniDepth \
--depth_model_path=checkpoints/depth/infinidepth.ckpt \
--output_resolution_mode=upsample \
--upsample_ratio=2Replace example_data/image/courtyard.jpg with your own image path.
For the example above, outputs are written to
example_data/pred_depth/for the colorized depth mapexample_data/pred_pcd/for the exported point cloud when--save_pcd=True
Example scripts
bash example_scripts/infer_depth/courtyard_infinidepth.sh
bash example_scripts/infer_depth/camera_infinidepth.sh
bash example_scripts/infer_depth/eth3d_infinidepth.sh
bash example_scripts/infer_depth/waymo_infinidepth.shMost useful options
| Argument | What it controls |
|---|---|
--output_resolution_mode |
Choose upsample, original, or specific. |
--upsample_ratio |
Used when output_resolution_mode=upsample. |
--output_size |
Explicit output size (H,W) when output_resolution_mode=specific. |
--save_pcd |
Export a point cloud alongside the depth map. |
--fx_org --fy_org --cx_org --cy_org |
Camera intrinsics in the original image resolution. |
2. 3D Gaussian + Novel-View Video from Single RGB Image (inference_gs.py)
Use this when you want a 3D Gaussian export from a single RGB image and an optional novel-view video.
Required input
RGB image
Required checkpoints
checkpoints/depth/infinidepth.ckptcheckpoints/gs/infinidepth_gs.ckptcheckpoints/moge-2-vitl-normal/model.ptrecover metric scale for 3D Gaussian export
Optional checkpoint
checkpoints/sky/skyseg.onnxadditional sky filtering
Recommended command
python inference_gs.py \
--input_image_path=example_data/image/courtyard.jpg \
--model_type=InfiniDepth \
--depth_model_path=checkpoints/depth/infinidepth.ckpt \
--gs_model_path=checkpoints/gs/infinidepth_gs.ckptReplace example_data/image/courtyard.jpg with your own image path.
For the example above, outputs are written to
example_data/pred_gs/InfiniDepth_courtyard_gaussians.plyexample_data/pred_gs/InfiniDepth_courtyard_novel_orbit.mp4
If --render_size is omitted, the novel-view video is rendered at the original input image resolution.
Example scripts
bash example_scripts/infer_gs/courtyard_infinidepth_gs.sh
bash example_scripts/infer_gs/camera_infinidepth_gs.sh
bash example_scripts/infer_gs/fruit_infinidepth_gs.sh
bash example_scripts/infer_gs/eth3d_infinidepth_gs.shMost useful options
| Argument | What it controls |
|---|---|
--render_novel_video |
Turn novel-view rendering on or off. |
--render_size |
Output video resolution (H,W). |
--novel_trajectory |
Camera motion type: orbit or swing. |
--sample_point_num |
Number of sampled points used for gaussian construction. |
--enable_skyseg_model |
Enable sky masking before gaussian sampling. |
--sample_sky_mask_dilate_px |
Dilate the sky mask before filtering. |
The exported
.plyfiles can be visualized in 3D viewers such as SuperSplat.
3. Depth Sensor Augmentation (Metric Depth and 3D Gaussian from RGB + Depth Sensor)
Use this mode when you have an RGB image plus metric depth from a depth sensor.
Required inputs
RGB imageSparse depthin.png,.npy,.npz,.h5,.hdf5, or.exr
Required checkpoints
checkpoints/depth/infinidepth_depthsensor.ckptcheckpoints/moge-2-vitl-normal/model.ptcheckpoints/gs/infinidepth_depthsensor_gs.ckpt
Required flags
--model_type=InfiniDepth_DepthSensor--input_depth_path=...
Metric Depth Inference Command
python inference_depth.py \
--input_image_path=example_data/image/eth3d_office.png \
--input_depth_path=example_data/depth/eth3d_office.npz \
--model_type=InfiniDepth_DepthSensor \
--depth_model_path=checkpoints/depth/infinidepth_depthsensor.ckpt \
--fx_org=866.39 \
--fy_org=866.04 \
--cx_org=791.5 \
--cy_org=523.81 \
--output_resolution_mode=upsample \
--upsample_ratio=13D Gaussian Inference Command
python inference_gs.py \
--input_image_path=example_data/image/eth3d_office.png \
--input_depth_path=example_data/depth/eth3d_office.npz \
--model_type=InfiniDepth_DepthSensor \
--depth_model_path=checkpoints/depth/infinidepth_depthsensor.ckpt \
--gs_model_path=checkpoints/gs/infinidepth_depthsensor_gs.ckpt \
--fx_org=866.39 \
--fy_org=866.04 \
--cx_org=791.5 \
--cy_org=523.81Example scripts
bash example_scripts/infer_depth/eth3d_infinidepth_depthsensor.sh
bash example_scripts/infer_depth/waymo_infinidepth_depthsensor.sh
bash example_scripts/infer_gs/eth3d_infinidepth_depthsensor_gs.sh
bash example_scripts/infer_gs/waymo_infinidepth_depthsensor_gs.shMost useful options
| Argument | What it controls |
|---|---|
--fx_org --fy_org --cx_org --cy_org |
Strongly recommended when you know the sensor intrinsics. |
--output_resolution_mode |
Output behavior for inference_depth.py. |
--render_size |
Video resolution for inference_gs.py. |
--output_ply_dir |
Custom output directory for gaussian export. |
4. Multi-View / Video Depth + Global Point Cloud (inference_multi_view_depth.py)
Use this when you want sequence-level depth inference from an RGB image folder or video, plus per-frame aligned point clouds and one merged global point cloud. By default the script runs DA3 once on the whole sequence, then aligns each InfiniDepth depth map to the corresponding DA3 depth map before export. When you already know the camera intrinsics and extrinsics, you can instead provide them directly and skip DA3 entirely.
Required inputs
RGB image directory,single RGB image, orvideoSparse depthdirectory / single file / depth video when--model_type=InfiniDepth_DepthSensor
Required checkpoints / dependencies
checkpoints/depth/infinidepth.ckptfor RGB-only inferencecheckpoints/depth/infinidepth_depthsensor.ckptfor RGB + depth sensor inferencecheckpoints/moge-2-vitl-normal/model.ptrecover metric scale for RGB-only frame inferencedepth-anything-3installed in the current environment when using the default DA3-based sequence mode; default DA3 model isdepth-anything/DA3-LARGE-1.1
Optional checkpoint
checkpoints/sky/skyseg.onnxadditional sky filtering
RGB-Only Multi-View / Video Command
python inference_multi_view_depth.py \
--input_path=example_data/multi-view/waymo/image \
--model_type=InfiniDepth \
--depth_model_path=checkpoints/depth/infinidepth.ckpt \RGB + Depth Sensor Multi-View / Video Command
python inference_multi_view_depth.py \
--input_path=example_data/multi-view/waymo/image \
--input_depth_path=example_data/multi-view/waymo/depth \
--model_type=InfiniDepth_DepthSensor \
--depth_model_path=checkpoints/depth/infinidepth_depthsensor.ckpt \For video input, replace --input_path with a video file. When --model_type=InfiniDepth_DepthSensor, --input_depth_path can also be a depth video and must contain the same number of frames as the RGB input.
Explicit Camera-Parameter Multi-View Command
python inference_multi_view_depth.py \
--input_path=example_data/multi-view/waymo/image \
--camera_intrinsics_dir=/path/to/intrinsics \
--camera_extrinsics_dir=/path/to/extrinsics \
--model_type=InfiniDepth \
--depth_model_path=checkpoints/depth/infinidepth.ckpt \The explicit camera mode expects Waymo-style text files under intrinsics/ and extrinsics/. Files are sorted lexicographically and matched one-to-one against the sorted RGB image list, so the number of camera files must exactly match the number of images. In this mode the script skips DA3 loading, DA3 cache export, DA3 RANSAC conditioning, and DA3 post scale alignment. This mode currently supports image inputs only, not video.
For the RGB-only example above, outputs are written to
example_data/multi-view/waymo/pred_sequence/image/frames/depth/for aligned raw depth mapsexample_data/multi-view/waymo/pred_sequence/image/frames/depth_vis/for colorized depth mapsexample_data/multi-view/waymo/pred_sequence/image/frames/pcd/for per-frame aligned point cloudsexample_data/multi-view/waymo/pred_sequence/image/frames/meta/for per-frame camera and alignment metadataexample_data/multi-view/waymo/pred_sequence/image/da3/sequence_pose.npzfor cached DA3 predictionsexample_data/multi-view/waymo/pred_sequence/image/merged/sequence_merged.plyfor the merged global point cloud
Example scripts
bash example_scripts/infer_depth/waymo_multi_view_infinidepth.sh
bash example_scripts/infer_depth/waymo_multi_view_infinidepth_depthsensor.sh
bash example_scripts/infer_depth/waymo_multi_view_infinidepth_explicit_camera.shMost useful options
| Argument | What it controls |
|---|---|
--input_path |
RGB image directory, single image, or video path. |
--input_depth_path |
Depth directory, single depth file, or depth video; required for InfiniDepth_DepthSensor. |
--camera_intrinsics_dir --camera_extrinsics_dir |
Enable explicit camera mode from sorted Waymo-style txt directories. Image inputs only; file counts must match the RGB frame count. |
--input_mode |
Force images or video instead of auto detection. |
--align_to_da3_depth |
Align each InfiniDepth depth map to the corresponding DA3 depth map before export. Ignored in explicit camera mode. |
--save_frame_pcd |
Save one aligned point cloud per frame. |
--save_merged_pcd |
Save the merged global point cloud across the whole sequence. |
--da3_scale_align_conf_threshold |
Minimum DA3 confidence used during per-frame scale estimation. |
--output_root |
Override the default pred_sequence/<sequence_name>/ output directory. |
5. Common Argument Conventions
| Argument | Used in | Description |
|---|---|---|
--input_image_path |
depth + gs | Path to the input RGB image. |
--input_path |
multi-view | Path to an RGB image directory, single image, or video. |
--input_depth_path |
depth + gs + multi-view | Optional metric depth prompt; required for InfiniDepth_DepthSensor. In multi-view mode this can be a depth directory, single depth file, or depth video. |
--camera_intrinsics_dir --camera_extrinsics_dir |
multi-view | Optional sequence camera parameter directories. When both are set, multi-view inference skips DA3 and uses the provided sorted txt files directly. |
--model_type |
depth + gs + multi-view | InfiniDepth for RGB-only, InfiniDepth_DepthSensor for RGB + sparse depth. |
--depth_model_path |
depth + gs | Path to the depth checkpoint. |
--gs_model_path |
gs only | Path to the gaussian predictor checkpoint. |
--moge2_pretrained |
depth + gs | MoGe-2 checkpoint used when --input_depth_path is missing. |
--fx_org --fy_org --cx_org --cy_org |
depth + gs | Camera intrinsics in original image resolution. Missing values fall back to MoGe-2 estimates or image-size defaults. |
--input_size |
depth + gs | Network input size (H,W) used during inference. |
--enable_skyseg_model |
depth + gs + multi-view | Enable sky masking before depth or gaussian sampling. |
--sky_model_ckpt_path |
depth + gs | Path to the sky segmentation ONNX checkpoint. |
Depth output modes
--output_resolution_mode=upsample: output size =input_size * upsample_ratio--output_resolution_mode=original: output size = original input image size--output_resolution_mode=specific: output size =output_size
Default output directories
| Script | Default directory |
|---|---|
inference_depth.py depth images |
pred_depth/ next to your input data folder |
inference_depth.py point clouds |
pred_pcd/ next to your input data folder |
inference_gs.py gaussians and videos |
pred_gs/ next to your input data folder |
inference_multi_view_depth.py sequence outputs |
pred_sequence/<sequence_name>/ next to your input data folder |
The repo also provides main.py training and validation entrypoints for InfiniDepth and InfiniDepth_DepthSensor.
First, prepare the training/validation data and the pretrained weight as described in DATA.md.
Before running any command, export the environment variables required by main.py:
export workspace=/path/to/your/experiments
export commonspace=/path/to/your/common_spaceworkspacestores experiment outputs underoutputs/<task>/<exp_name>/commonspacestores datasets and pretrained weights shared across experiments
| If you want ... | Recommended command |
|---|---|
| Fine-tune from an existing checkpoint | Add ckpt_path=... to the training command |
| Train from scratch | Omit ckpt_path and use a fresh exp_name |
| Validate on the mixed real-data benchmark | Run main.py with --entry val |
1. Fine-Tuning from an Existing Checkpoint (main.py, default train_net entry)
Use this when you want to initialize training from an existing InfiniDepth checkpoint. The training config referenced below uses Hypersim as the training set and runs validation on a mixed real-data benchmark at the end of each epoch.
Required environment
workspace=/path/to/your/experimentscommonspace=/path/to/your/common_space
RGB-Only Fine-Tuning Command
python3 main.py \
--cfg_file training/exp_configs/exps/infinidepth.yaml \
--include training/exp_configs/components/data/train/infinidepth_train_hypersim.yaml \
ckpt_path=checkpoints/depth/infinidepth.ckpt \
exp_name=finetune_infinidepth_on_hypersim \
model.compute_abs_metric=True \
model.save_orig_pred=True \
model.save_metrics=True \
pl_trainer.devices=8RGB + Depth Sensor Fine-Tuning Command
python3 main.py \
--cfg_file training/exp_configs/exps/infinidepth_depthsensor.yaml \
--include training/exp_configs/components/data/train/infinidepth_train_hypersim.yaml \
ckpt_path=checkpoints/depth/infinidepth_depthsensor.ckpt \
exp_name=finetune_infinidepth_depthsensor_on_hypersim \
model.compute_abs_metric=True \
model.save_orig_pred=True \
model.save_metrics=True \
pl_trainer.devices=8Equivalent launch scripts
bash launch_scripts/train/infinidepth.sh
bash launch_scripts/train/infinidepth_depthsensor.shOutputs
${workspace}/outputs/${task}/${exp_name}/checkpoints/for saved checkpoints${workspace}/outputs/${task}/${exp_name}/tb/for TensorBoard logs
Notes
ckpt_pathmust point to an existing checkpoint.- The training data config is
training/exp_configs/components/data/train/infinidepth_train_hypersim.yaml. - If the same
exp_namealready has checkpoints in${workspace}/outputs/${task}/${exp_name}/checkpoints/, training will resume from the latest saved checkpoint in that directory.
2. Train from Scratch (main.py, no ckpt_path)
Use this when you want to start training without loading an InfiniDepth .ckpt. In this mode, do not pass ckpt_path. The model will still load the DINOv3 backbone from ${commonspace}/pretrained_models/dinov3/. You need to download the DINOv3 weights yourself and place them there before running.
RGB-Only Training-from-Scratch Command
python3 main.py \
--cfg_file training/exp_configs/exps/infinidepth.yaml \
--include training/exp_configs/components/data/train/infinidepth_train_hypersim.yaml \
exp_name=train_infinidepth_from_scratch \
model.compute_abs_metric=True \
model.save_orig_pred=True \
model.save_metrics=True \
pl_trainer.devices=2RGB + Depth Sensor Training-from-Scratch Command
python3 main.py \
--cfg_file training/exp_configs/exps/infinidepth_depthsensor.yaml \
--include training/exp_configs/components/data/train/infinidepth_train_hypersim.yaml \
exp_name=train_infinidepth_depthsensor_from_scratch \
model.compute_abs_metric=True \
model.save_orig_pred=True \
model.save_metrics=True \
pl_trainer.devices=2Notes
- Use a new
exp_namefor a clean scratch run. - If you intentionally want to reuse an old
exp_name, setresume_training=Falseto prevent automatic resume. Be careful: whenresume_training=False, the code will delete the old output directory before training.
3. Validation from a Checkpoint (main.py with --entry val)
Use this when you want to run validation metrics on the mixed real-data benchmark defined in training/exp_configs/components/data/test/infinidepth_mix_data.yaml.
RGB-Only Validation Command
python3 main.py \
--cfg_file training/exp_configs/exps/infinidepth.yaml \
--include training/exp_configs/components/data/test/infinidepth_mix_data.yaml \
--entry val \
ckpt_path=checkpoints/depth/infinidepth.ckpt \
exp_name=eval_infinidepth \
model.compute_abs_metric=True \
model.save_orig_pred=True \
model.save_metrics=True \
pl_trainer.devices=2RGB + Depth Sensor Validation Command
python3 main.py \
--cfg_file training/exp_configs/exps/infinidepth_depthsensor.yaml \
--include training/exp_configs/components/data/test/infinidepth_mix_data.yaml \
--entry val \
ckpt_path=checkpoints/depth/infinidepth_depthsensor.ckpt \
exp_name=eval_infinidepth_depthsensor \
model.compute_abs_metric=True \
model.save_orig_pred=True \
model.save_metrics=True \
pl_trainer.devices=2Reference launch scripts
bash launch_scripts/eval/infinidepth.sh
bash launch_scripts/eval/infinidepth_depthsensor.shValidation datasets
- Kitti val split
- ETH3D val split
- NYU val split
- ScanNet val split
- DIODE indoor val split
- Synth4K (CyberPunk, DeadIsland, SpiderMan2, SpiderManMM, WatchDogLegion)
These datasets are read from ${commonspace}/datasets/ using the meta files referenced in training/exp_configs/components/data/test/infinidepth_mix_data.yaml.
Outputs
${workspace}/outputs/${task}/${exp_name}/val_metrics/for validation logs${workspace}/outputs/${task}/${exp_name}/default/metrics/metrics.jsonfor aggregated validation metrics${workspace}/outputs/${task}/${exp_name}/default/metrics/all_scenes.csvwhenmodel.save_metrics=True
4. Common Overrides
| Argument | What it controls |
|---|---|
ckpt_path |
Initialization or evaluation checkpoint path. Omit it for training from scratch. |
exp_name |
Experiment name used to build ${workspace}/outputs/${task}/${exp_name}. |
pl_trainer.devices |
Number of GPUs used by PyTorch Lightning. |
model.compute_abs_metric |
Enable absolute-metric evaluation during training or validation. |
model.save_orig_pred |
Save original prediction outputs alongside logs and metrics. |
model.save_metrics |
Save metric files for later inspection. |
--entry val |
Switch main.py from the default training entry to the validation entry. |
--include |
Merge an extra data config, such as training/exp_configs/components/data/test/infinidepth_mix_data.yaml. |
We thank Yuanhong Yu, Gangwei Xu, Haoyu Guo and Chongjie Ye for their insightful discussions and valuable suggestions, and Zhen Xu for his dedicated efforts in curating the synthetic data.
If you find InfiniDepth useful in your research, please consider citing:
@article{yu2026infinidepth,
title={InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields},
author={Hao Yu, Haotong Lin, Jiawei Wang, Jiaxin Li, Yida Wang, Xueyang Zhang, Yue Wang, Xiaowei Zhou, Ruizhen Hu and Sida Peng},
booktitle={arXiv preprint},
year={2026}
}