Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

This directory contains the code used in our paper “Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning”.

Setup

This repository itself does not impose any strict package requirements. Instead, you should set up your environment according to the dependencies of the specific model you intend to run:

Qwen2.5-Omni → See Qwen2.5-Omni huggingface page
Baichuan-Omni-1.5d → follow Baichuan-Omni huggingface page
MiniCPM-o-2.6 → follow MiniCPM huggingface page
Phi-4 Multimodal → follow Phi-4 huggingface page

⚠️ Make sure the environment matches the model runner you select (see src/evaluation/models/). The code framework itself requires no additional dependencies beyond the model-specific setup.

Dataset

We publish per-subset configs on the Hub (example): ycwang11/OmniReason with configs:

alternative, independent, complementary, contradictory, equivalent, entailment, recognition

Media handling:

The dataset uses datasets.Image/datasets.Audio features. When pushed to the Hub, media are embedded into parquet shards. HF stores only a basename in the path field and the actual bytes in bytes.
To make this seamless for downstream scripts, we provide src/utils/hf_loader.py, which:
- casts media columns with decode=False
- materializes embedded bytes into deterministic local files and returns absolute paths
- cache location: ~/.cache/omnireason_media_cache (override with OMNIREASON_MEDIA_CACHE)

Build + publish (optional): see dataset/hf_publish/build_hg_dataset.py for staging local CSVs + media and pushing to the Hub.

Evaluation

Minimal CLI is in src/evaluation/eval_pipeline.py. It wires a registered model runner and a task:

# Example: Qwen2.5-Omni on the `equivalent` subset
python src/evaluation/eval_pipeline.py \
  --model Qwen2.5-Omni \
  --task equivalent \

Model runners: see src/evaluation/models/*.py (Qwen2.5-Omni, MiniCPM, Baichuan, Phi-4 Omni, etc.). Each runner contains paths and options you may need to adjust (e.g., local checkpoint directory).
Tasks are registered via decorators in src/evaluation/tasks/.

If you prefer -m invocation, set PYTHONPATH as above and run:

python -m src.evaluation.eval_pipeline \
  --model Qwen2.5-Omni --task independent

Interpretation

Two standalone scripts consume the HF dataset via the shared loader, then run Qwen2.5-Omni for analysis.

Extract Attention (src/interpretation/extract_attention.py)

Runs the model and exports layerwise attention vectors associated with facts, rules, and questions.
Key args:
- --type {subset} (HF config)
- --hf-repo-id, --split
- --pooling {mean,max,none}: head pooling mode for exported vectors
- --mod_order (optional): permutation of IAT as above

Example:

python extract_attention.py \
  --type independent --pooling mean

Attention Manipulation (src/interpretation/attention_manipulation.py)

Adjusts per-head temperatures on selected layers and runs the model.
Key args:
- --type {subset}: HF config/subset name (e.g., equivalent)
- --setting {vanilla,layer} and --layer {bottom,middle,top}
- --temp_mode {decrease,increase} and --scale (amount)
- --mod_order (optional): a permutation of IAT to select which slot is used for each modality

Example:

python attention_manipulation.py \
  --type independent --setting layer --layer top --temp_mode decrease --scale 0.2

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

Setup

Dataset

Evaluation

Interpretation

About

Uh oh!

Releases

Packages

Languages

License

DELTA-DoubleWise/OmniReason

Folders and files

Latest commit

History

Repository files navigation

Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning

Setup

Dataset

Evaluation

Interpretation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages