FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

🏔️ CVPR 2026 🏔️

Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu
University of California, Merced - Adobe Research

FaceCam generates portrait videos with precise camera control from a single input video and a target camera trajectory.

🔧 Prerequisites

Environment Setup

conda create -n facecam python=3.11 -y
conda activate facecam

# Install the package (includes core dependencies)
pip install -e .

# Additional required packages
pip install xformers # Choose a version which is compatible with your PyTorch
pip install git+https://github.com/graphdeco-inria/diff-gaussian-rasterization --no-build-isolation
pip install mediapipe==0.10.21

Downloads

We support the Wan 2.2 14B model. Create the directory and download all required assets:

mkdir -p models ckpts

1. Base model weights (via ModelScope):

pip install modelscope
modelscope download --model Wan-AI/Wan2.2-I2V-A14B --local_dir ./models/Wan-AI/Wan2.2-I2V-A14B

2. FaceCam assets (checkpoints, proxy 3D head) from Hugging Face:

pip install huggingface_hub
huggingface-cli download wlyu/FaceCam --local-dir ./ckpts

Alternatively, download from Google Drive: checkpoints and proxy 3D head.

3. Face landmarker (MediaPipe model):

wget -O ckpts/face_landmarker_v2_with_blendshapes.task -q \
  https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task

The expected layout:

models/
└── Wan-AI/
    └── Wan2.2-I2V-A14B/
        ├── high_noise_model/
        ├── low_noise_model/
        ├── models_t5_umt5-xxl-enc-bf16.pth
        └── Wan2.1_VAE.pth

ckpts/
├── face_landmarker_v2_with_blendshapes.task
├── gaussians.ply
└── wan2.2_14b/
    ├── high/released_version/
    └── low/released_version/

🚀 Inference

Single GPU

# Wan 2.2 14B (default 704×480, 81 frames)
python inference.py \
    --model_dir ./models \
    --ckpt_dir  ./ckpts \
    --input_path ./inputs \
    --output_dir ./outputs

--input_path accepts either a single .mp4/.mov file or a directory of videos.

For each input video <name>.mp4, the script saves:

<name>.mp4 — the generated video
<name>_input.mp4 — the cropped input video
<name>_camera.mp4 — the camera condition visualization

⚠️ Note ⚠️

By default, the code generates a random camera trajectory. To use a specific trajectory instead, you can customize the random_camera_params function in inference.py.
We crop the input video with the crop_video function in diffsynth/utils/mediapipe_utils.py, which may not bring the best result. You can customize this function and view the cropped input video and camera condition video saved here before diffusion generation.

Multi-GPU

Use accelerate to distribute samples across GPUs:

accelerate launch --num_processes 4 inference.py \
    --model_dir ./models \
    --ckpt_dir  ./ckpts \
    --input_path ./inputs \
    --output_dir ./outputs

Low-VRAM Mode

For GPUs with limited memory (e.g. running with 48GB VRAM), enable CPU offloading so that only the active model component stays on GPU:

python inference.py \
    --model_dir ./models \
    --ckpt_dir  ./ckpts \
    --input_path ./inputs \
    --output_dir ./outputs \
    --low_vram

This trades speed for memory — the text encoder, DiTs, and VAE are moved between CPU and GPU as needed instead of keeping everything resident.

🎓 Training (Coming Soon)

📝 Citation

If you find our work useful for your research, please consider citing our paper:

@misc{facecam,
  title         = {FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning},
  author        = {Weijie Lyu and Ming-Hsuan Yang and Zhixin Shu},
  year          = {2026},
  eprint        = {2603.05506},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2603.05506},
}

🙏 Acknowledgements

This work is built upon Wan and DiffSynth. We thank the authors for their excellent work.

This is a self-reimplementation of FaceCam. The code has been reimplemented and the weights retrained. Results may differ slightly from those reported in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
diffsynth		diffsynth
inputs		inputs
media		media
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

🏔️ CVPR 2026 🏔️

🔧 Prerequisites

Environment Setup

Downloads

🚀 Inference

Single GPU

Multi-GPU

Low-VRAM Mode

🎓 Training (Coming Soon)

📝 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

🏔️ CVPR 2026 🏔️

🔧 Prerequisites

Environment Setup

Downloads

🚀 Inference

Single GPU

Multi-GPU

Low-VRAM Mode

🎓 Training (Coming Soon)

📝 Citation

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages