Perception Models: An Easy to Use Repository for Perception Tasks

Perception Models is a user-friendly repository designed to support the training, inference, and evaluation of Perception Language Model (PLM) and Perception Encoder (PE). It is designed to be modular and easy to expand and experiment with.

[Apr-17-25]: Perception Encoder (PE) and Perception Language Model (PLM) are released. [Blog] 🔥🔥

Perception Encoder (PE)

We release PE - a family of state-of-the-art vision encoders for vision-centric and vision-language tasks. We refer the readers to apps/pe/README.md where we provide details about inference, evaluation and downstream tasks.

Perception Language Model (PLM)

We release PLM - a family of open and fully reproducible models to facilitate research in vision-language model (VLM). We refer the readers to apps/plm/README.md where we provide details about training, evaluation and inference using PLM.

Installation 🔧

git clone https://github.com/facebookresearch/perception_models.git
cd perception_models

conda create --name perception_models python=3.12
conda activate perception_models

# Install PyTorch
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers --index-url https://download.pytorch.org/whl/cu124

# We use torchcodec for decoding videos into PyTorch tensors
conda install ffmpeg -c conda-forge
pip install torchcodec==0.1 --index-url=https://download.pytorch.org/whl/cu124

pip install -e .

This will install an editable version of repo, allowing you to make changes to the code without needing to reinstall the package every time.

🙏 Acknowledgement

We are thankful to Meta Lingua for releasing their code as open-source contributions. The code structure and code implementation of the LLM is directly forked from Meta Lingua. We are also thankful to Open_CLIP for open-source contributions in CLIP training, and CLIP_benchmark for CLIP model evaluation.

📜 Citation

@article{bolya2025PerceptionEncoder,
  title={Perception Encoder: The best visual embeddings are not at the output of the network},
  author={Daniel Bolya and Po-Yao Huang and Peize Sun and Jang Hyun Cho and Andrea Madotto and Chen Wei and Tengyu Ma and Jiale Zhi and Jathushan Rajasegaran and Hanoona Rasheed and Junke Wang and Marco Monteiro and Hu Xu and Shiyu Dong and Nikhila Ravi and Daniel Li and Piotr Doll{\'a}r and Christoph Feichtenhofer},
  journal={arXiv:2504.13181},
  year={2025}
}

@article{cho2025PerceptionLM,
  title={PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding},
  author={Jang Hyun Cho and Andrea Madotto and Effrosyni Mavroudi and Triantafyllos Afouras and Tushar Nagarajan and Muhammad Maaz and Yale Song and Tengyu Ma and Shuming Hu and Hanoona Rasheed and Peize Sun and Po-Yao Huang and Daniel Bolya and Suyog Jain and Miguel Martin and Huiyu Wang and Nikhila Ravi and Shashank Jain and Temmy Stark and Shane Moon and Babak Damavandi and Vivian Lee and Andrew Westbury and Salman Khan and Philipp Kr\"{a}henb\"{u}hl and Piotr Doll{\'a}r and Lorenzo Torresani and Kristen Grauman and Christoph Feichtenhofer},
  journal={arXiv:2504.13180},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
apps		apps
core		core
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.PE		LICENSE.PE
LICENSE.PLM		LICENSE.PLM
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Perception Models: An Easy to Use Repository for Perception Tasks

Perception Encoder (PE)

Perception Language Model (PLM)

Installation 🔧

🙏 Acknowledgement

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Perception Models: An Easy to Use Repository for Perception Tasks

Perception Encoder (PE)

Perception Language Model (PLM)

Installation 🔧

🙏 Acknowledgement

📜 Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages