Stars
Minimal implementations of several flow-map / average-velocity–type methods.
Code for "How far can we go with ImageNet for Text-to-Image generation?" paper
Code for Post-hoc Probabilistic Vision-Language Models
[ACM Computing Surveys] The collection of awesome papers on alignment of diffusion models.
Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
Evaluating text-to-image/video/3D models with VQAScore
[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Official Code for Stable Cascade
[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
Code for Continuously Changing Corruptions (CCC) benchmark + evaluation
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
GenEval: An object-focused framework for evaluating text-to-image alignment
Code for the ICLR'24 paper: "Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models"
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)
Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"


