Skip to content
View sgk98's full-sized avatar

Highlights

  • Pro

Block or report sgk98

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Minimal implementations of several flow-map / average-velocity–type methods.

Python 2 Updated Nov 23, 2025
Python 70 2 Updated Dec 5, 2025

Code for "How far can we go with ImageNet for Text-to-Image generation?" paper

Python 96 1 Updated Nov 13, 2025

Code for Post-hoc Probabilistic Vision-Language Models

Python 13 11 Updated Feb 10, 2026

[ACM Computing Surveys] The collection of awesome papers on alignment of diffusion models.

417 17 Updated Feb 6, 2026

Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

Python 35 1 Updated Mar 12, 2024

[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval

Python 40 Updated Apr 11, 2025

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

Python 48 5 Updated Nov 3, 2025

Evaluating text-to-image/video/3D models with VQAScore

Jupyter Notebook 382 35 Updated Sep 22, 2025

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

Python 266 11 Updated Apr 7, 2025

[NeurIPS 2024] ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

Python 166 15 Updated Sep 15, 2025
HTML 14 Updated Jul 5, 2024

DUSt3R: Geometric 3D Vision Made Easy

Python 7,056 745 Updated Sep 24, 2025

Official Code for Stable Cascade

Jupyter Notebook 6,575 520 Updated Jul 25, 2024

[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"

Python 84 7 Updated Jul 4, 2024
Jupyter Notebook 5 Updated Feb 5, 2024

(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life

Python 367 19 Updated Dec 2, 2024

[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering

HTML 1,284 86 Updated Jul 2, 2025

Code for Continuously Changing Corruptions (CCC) benchmark + evaluation

Python 42 4 Updated Aug 21, 2024

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,685 132 Updated Jan 14, 2025

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 950 54 Updated Aug 5, 2025

GenEval: An object-focused framework for evaluating text-to-image alignment

HTML 439 31 Updated Mar 3, 2025

Code for the ICLR'24 paper: "Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models"

13 Updated Jan 17, 2024

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,401 894 Updated Dec 17, 2024

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

Python 91 11 Updated Feb 13, 2024

Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"

Python 61 4 Updated Jun 12, 2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Jupyter Notebook 659 28 Updated May 24, 2024

Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)

Python 88 3 Updated Feb 2, 2025

Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"

Python 27 1 Updated Jul 10, 2023
Next