This directory contains the code used in our paper “Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning”.
This repository itself does not impose any strict package requirements. Instead, you should set up your environment according to the dependencies of the specific model you intend to run:
- Qwen2.5-Omni → See Qwen2.5-Omni huggingface page
- Baichuan-Omni-1.5d → follow Baichuan-Omni huggingface page
- MiniCPM-o-2.6 → follow MiniCPM huggingface page
- Phi-4 Multimodal → follow Phi-4 huggingface page
⚠️ Make sure the environment matches the model runner you select (seesrc/evaluation/models/). The code framework itself requires no additional dependencies beyond the model-specific setup.
We publish per-subset configs on the Hub (example): ycwang11/OmniReason with configs:
alternative,independent,complementary,contradictory,equivalent,entailment,recognition
Media handling:
-
The dataset uses
datasets.Image/datasets.Audiofeatures. When pushed to the Hub, media are embedded into parquet shards. HF stores only a basename in thepathfield and the actual bytes inbytes. -
To make this seamless for downstream scripts, we provide
src/utils/hf_loader.py, which:- casts media columns with
decode=False - materializes embedded
bytesinto deterministic local files and returns absolute paths - cache location:
~/.cache/omnireason_media_cache(override withOMNIREASON_MEDIA_CACHE)
- casts media columns with
Build + publish (optional): see dataset/hf_publish/build_hg_dataset.py for staging local CSVs + media and pushing to the Hub.
Minimal CLI is in src/evaluation/eval_pipeline.py. It wires a registered model runner and a task:
# Example: Qwen2.5-Omni on the `equivalent` subset
python src/evaluation/eval_pipeline.py \
--model Qwen2.5-Omni \
--task equivalent \- Model runners: see
src/evaluation/models/*.py(Qwen2.5-Omni, MiniCPM, Baichuan, Phi-4 Omni, etc.). Each runner contains paths and options you may need to adjust (e.g., local checkpoint directory). - Tasks are registered via decorators in
src/evaluation/tasks/.
If you prefer -m invocation, set PYTHONPATH as above and run:
python -m src.evaluation.eval_pipeline \
--model Qwen2.5-Omni --task independentTwo standalone scripts consume the HF dataset via the shared loader, then run Qwen2.5-Omni for analysis.
- Extract Attention (
src/interpretation/extract_attention.py)
-
Runs the model and exports layerwise attention vectors associated with facts, rules, and questions.
-
Key args:
--type {subset}(HF config)--hf-repo-id,--split--pooling {mean,max,none}: head pooling mode for exported vectors--mod_order(optional): permutation ofIATas above
Example:
python extract_attention.py \
--type independent --pooling mean- Attention Manipulation (
src/interpretation/attention_manipulation.py)
-
Adjusts per-head temperatures on selected layers and runs the model.
-
Key args:
--type {subset}: HF config/subset name (e.g.,equivalent)--setting {vanilla,layer}and--layer {bottom,middle,top}--temp_mode {decrease,increase}and--scale(amount)--mod_order(optional): a permutation ofIATto select which slot is used for each modality
Example:
python attention_manipulation.py \
--type independent --setting layer --layer top --temp_mode decrease --scale 0.2