|
Improving Target Presence and Plurality Recognition for Generalized Referring Image Segmentation |
AAAI 2026 |
[webpage] |
| PixelRefer |
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity |
arxiv 25.10 |
[code] |
| CoPatch |
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP |
arxiv 25.09 |
[code] |
| SaFiRe |
SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation |
NeurIPS 2025 |
|
| UniPixel |
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning |
NeurIPS 2025 |
[code] [webpage] |
| RaAM |
Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding |
ICCV 2025 |
|
| Latent-VG |
Latent Expression Generation for Referring Image Segmentation and Grounding |
ICCV 2025 |
|
| DeRIS |
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy |
ICCV 2025 |
[code] |
| WeakMCN |
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation |
CVPR 2025 |
[code] |
| HybridGL |
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation |
CVPR 2025 |
[code] |
| IteRPrimE |
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis |
AAAI 2025 |
[code] |
| DETRIS |
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation |
AAAI 2025 |
[code] |
| VATEX |
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding |
WACV 2025 |
[code] [webpage] |
| Shared-RIS |
A Simple Baseline with Single-encoder for Referring Image Segmentation |
arxiv 24.08 |
[code] |
| ASDA |
Adaptive Selection based Referring Image Segmentation |
ACM MM 2024 |
code |
| NeMo |
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation |
ECCV 2024 |
[webpage] [code] |
| ReMamber |
ReMamber: Referring Image Segmentation with Mamba Twister |
ECCV 2024 |
[code] |
| GTMS |
GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method |
ECCV 2024 |
[code] |
| SAM4MLLM |
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation |
ECCV 2024 |
[code] |
| Pseudo-RIS |
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation |
ECCV 2024 |
[code] |
| SafaRi |
SafaRi: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation |
ECCV 2024 |
[webpage] |
| CM-MaskSD |
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation |
TMM 2024 |
|
| Prompt-RIS |
Prompt-Driven Referring Image Segmentation with Instance Contrasting |
CVPR 2024 |
|
| LQMFormer |
LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation |
CVPR 2024 |
|
| PPT |
Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation |
CVPR 2024 |
|
| GSVA |
GSVA: Generalized Segmentation via Multimodal Large Language Models |
CVPR 2024 |
[code] |
| RMSIN |
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation |
CVPR 2024 |
[code] |
| MRES |
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation |
CVPR 2024 |
[code] [webpage] |
| MagNet |
Mask Grounding for Referring Image Segmentation |
CVPR 2024 |
[webpage] |
| LISA |
LISA: Reasoning Segmentation via Large Language Model |
CVPR 2024 |
[code] |
| RefSegformer |
Towards Robust Referring Image Segmentation |
TIP 2024 |
[code] |
| JMCELN |
Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network |
EMNLP 2023 |
[code] |
| CVMN |
Unsupervised Domain Adaptation for Referring Semantic Segmentation |
ACM MM 2023 |
[code] |
| CARIS |
CARIS: Context-Aware Referring Image Segmentation |
ACM MM 2023 |
[code] |
| TAS |
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation |
EMNLP 2023 |
|
| BKINet |
Bilateral Knowledge Interaction Network for Referring Image Segmentation |
TMM 2023 |
[code] |
| Group-RES |
Advancing Referring Expression Segmentation Beyond Single Image |
ICCV 2023 |
[code] |
|
Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency |
ICCV 2023 |
|
|
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision |
ICCV 2023 |
|
| TRIS |
Referring Image Segmentation Using Text Supervision |
ICCV 2023 |
[code] |
| RIS-DMMI |
Beyond One-to-One: Rethinking the Referring Image Segmentation |
ICCV 2023 |
[code] |
| ETRIS |
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation |
ICCV 2023 |
[code] |
| SEEM |
Segment Everything Everywhere All at Once |
arXiv 23.04 |
[code] |
| SLViT |
SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation |
IJCAI 2023 |
[code] |
| WiCo |
WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation |
IJCAI 2023 |
|
| M3Att |
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation |
TIP 2023 |
|
| X-Decoder |
X-Decoder: Generalized Decoding for Pixel, Image and Language |
CVPR 2023 |
[code] [project] |
| Partial-RES |
Learning to Segment Every Referring Object Point by Point |
CVPR 2023 |
[code] |
| MCRES |
Meta Compositional Referring Expression Segmentation |
CVPR 2023 |
|
| Global-Local CLIP |
Zero-shot Referring Image Segmentation with Global-Local Context Features |
CVPR 2023 |
[code] |
| PolyFormer |
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation |
CVPR 2023 |
[code] [project] |
| GRES |
GRES: Generalized Referring Expression Segmentation |
CVPR 2023 |
[code] [dataset] [project] |
| CGFormer |
Contrastive Grouping with Transformer for Referring Image Segmentation |
CVPR 2023 |
[code] |
| SADLR |
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation |
AAAI 2023 |
|
| R-RIS |
Towards Robust Referring Image Segmentation |
arXiv 22.09 |
[code] [project] |
| - |
Learning From Box Annotations for Referring Image Segmentation |
TNNLS 2022 |
[code] |
| - |
Instance-Specific Feature Propagation for Referring Segmentation |
TMM 2022 |
|
| LAVT |
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation |
CVPR 2022 |
[code] |
| CRIS |
CRIS: CLIP-Driven Referring Image Segmentation |
CVPR 2022 |
[code] |
| ReSTR |
ReSTR: Convolution-free Referring Image Segmentation Using Transformers |
CVPR 2022 |
[project] |
| TV-Net |
Two-stage Visual Cues Enhancement Network for Referring Image Segmentation |
ACM MM 2021 |
[code] |
| VLT |
Vision-Language Transformer and Query Generation for Referring Segmentation |
ICCV 2021 |
[code] |
| MDETR |
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding |
ICCV 2021 |
[code] [project] |
| CEFNet |
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation |
CVPR 2021 |
[code] |
| BUSNet |
Bottom-Up Shift and Reasoning for Referring Image Segmentation |
CVPR 2021 |
[code] |
| LTS |
Locate then Segment: A Strong Pipeline for Referring Image Segmentation |
CVPR 2021 |
|
| CGAN |
Cascade Grouped Attention Network for Referring Expression Segmentation |
ACM MM 2020 |
|
| LSCM |
Linguistic Structure Guided Context Modeling for Referring Image Segmentation |
ECCV 2020 |
[code] |
| CMPC-Refseg |
Referring Image Segmentation via Cross-Modal Progressive Comprehension |
CVPR 2020 |
[code] |
| BRINet |
Bi-directional Relationship Inferring Network for Referring Image Segmentation |
CVPR 2020 |
[code] |
| PhraseCut |
PhraseCut: Language-based Image Segmentation in the Wild |
CVPR 2020 |
[code] [project] |
| MCN |
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation |
CVPR 2020 |
[code] |
| - |
Dual Convolutional LSTM Network for Referring Image Segmentation |
TMM 2020 |
|
| STEP |
See-Through-Text Grouping for Referring Image Segmentation |
ICCV 2019 |
|
| lang2seg |
Referring Expression Object Segmentation with Caption-Aware Consistency |
BMVC 2019 |
[code] |
| CMSA |
Cross-Modal Self-Attention Network for Referring Image Segmentation |
CVPR 2019 |
[code] |
| KWA |
Key-Word-Aware Network for Referring Expression Image Segmentation |
ECCV 2018 |
[code] |
| DMN |
Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries |
ECCV 2018 |
[code] |
| RRN |
Referring Image Segmentation via Recurrent Refinement Networks |
CVPR 2018 |
[code] |
| MAttNet |
MAttNet: Modular Attention Network for Referring Expression Comprehension |
CVPR 2018 |
[code] [Demo] |
| RMI |
Recurrent Multimodal Interaction for Referring Image Segmentation |
ICCV 2017 |
[code] |
| LSTM-CNN |
Segmentation from natural language expressions |
ECCV 2016 |
[code] [project] |