| On Fairness of Task Arithmetic: The Role of Task Vectors |
2025 |
Arxiv |
LLaMA2-7B |
| The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs |
2025 |
Arxiv |
FALCON 3 7B, QWEN2.5 7B Instruct, LLAMA 3.1 8B Instruct, AYA Expanse 8B |
| Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning |
2025 |
Arxiv |
MetaMath-Mistral-7B, Dolphin-2.1-Mistral-7B and Speechless-Code-Mistral-7Bv1.0 |
| Training-free LLM Merging for Multi-task Learning |
2025 |
ACL |
Echelon-AI/Med-Qwen2-7B, shtdbb/qwen2-7b-med, Qwen2-Instruct |
| Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models |
2025 |
Arxiv |
Qwen2.5-7B, Qwen2.5-32B |
| Unified Multi-Task Learning & Model Fusion for Efficient Language Model Guardrailing |
2025 |
Arxiv |
|
| Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe |
2025 |
Arxiv |
Typhoon2 R1 70B, Deepseek R1 70B |
| Efficient Model Development through Fine-tuning Transfer |
2025 |
Arxiv |
Llama 3.1 8B |
| Command A: An Enterprise-Ready Large Language Model |
2025 |
Arxiv |
Command R7B |
| Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging |
2025 |
Arxiv |
Qwen2.5-32B, DeepSeek-R1-32B |
| Extrapolation Merging: Keep Improving With Extrapolation and Merging |
2025 |
Arxiv |
Qwen2-7B, Meta-Llama-3-8B, Mistral-Nemo-Base-2407-12B, Qwen1.5-14B |
| FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion |
2025 |
Arxiv |
Gemma-2-27B-it, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct |
| Superficial Self-Improved Reasoners Benefit from Model Merging |
2025 |
Arxiv |
Llama2-7B |
| Nature-Inspired Population-Based Evolution of Large Language Models |
2025 |
Arxiv |
|
| Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge |
2025 |
Arxiv |
Gemma-2-9B, Llama-3-8B |
| Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation |
2025 |
Arxiv |
WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca |
| LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging |
2025 |
Arxiv |
NuminaMath-7B, DeepSeek-Math-7B-Base, LLaMA-series models, WizardMath-13B |
| Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition |
2025 |
Arxiv |
ContactDoctor-8B |
| Transferring Textual Preferences to Vision-Language Understanding through Model Merging |
2025 |
Arxiv |
Llama-3.2-11B-Vision -Instruct, Llama-3.1-Tulu-2-8B-uf-mean-rm, Llama-3.1-Tulu-3-8B-RM |
| Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging |
2025 |
Arxiv |
Llama-2-13b, WizardMath-13B-V1.0, WizardLM13B-V1.2, llama-2-13b-codealpaca |
| An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging |
2025 |
Arxiv |
Typhoon2 70B Instruct, DeepSeek R1 70B Distill, Llama 3.1 70B, Llama 3.3 70B |
| Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging |
2025 |
Arxiv |
WizardLM-13B, WizardMath-13B, and llama-2-13b-code-alpaca |
| Skill Expansion and Composition in Parameter Space |
2025 |
Arxiv |
|
| InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion |
2025 |
Arxiv |
Qwen2.5-Coder-14B-Instruct, Qwen2.5-14B-Instruct, and Mistral-Small-24B-Instruct-2501 |
| Channel Merging: Preserving Specialization for Merged Experts |
2025 |
AAAI |
Dolphin-2.2.1-Mistral-7B, Speechless-Code-Mistral-7B, MetaMathMistral-7B, Chinese-Mistral-7BInstruct-v0.1 |
| Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion |
2024 |
Arxiv |
MiniGemini-8B and SLIME-8B |
| AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents |
2024 |
Arxiv |
Llama3.1-8B |
| JRadiEvo: A Japanese Radiology Report Generation Model Enhanced by Evolutionary Optimization of Model Merging |
2024 |
Arxiv |
Bunny-v1_1-Llama-3-8B-V, MMed-Llama-3-8B-EnIns, OpenBioLLM-Llama3-8B, Llama-3-Swallow-8B-Instruct-v0.1 |
| If You Can’t Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs |
2024 |
Arxiv |
Command R+ 104B |
| Agent Skill Acquisition for Large Language Models via CycleQD |
2024 |
Arxiv |
Llama3-8B-Instruct |
| Collaboratively adding new knowledge to an LLM |
2024 |
Arxiv |
Meta-Llama-3-8B |
| Unconstrained Model Merging for Enhanced LLM Reasoning |
2024 |
Arxiv |
CodeLlama-7B-Ins, CodeLlama-70B-Ins, Deepseek-Coder-Ins-v1.5, Qwen2.5-Math-7B-Ins, WizardMath-7B-V1.1, OpenMath-Mistral 7B, MetaMath-7B, MetaMath-70B |
| LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks |
2024 |
Arxiv |
Llama-7b, Llama2-7b-chat |
| Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging |
2024 |
Arxiv |
Llama 2 7B |
| Exploring Model Kinship for Merging Large Language Models |
2024 |
Arxiv |
Mistral-7B, Mistral-7b-instruct-v0.2, MetaMath-mistral-7b, Open-chat-3.5-1210 |
| Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation |
2024 |
Arxiv |
shisa-gamma-7b, WizardMath-7B-V1.1, Abel-7B-002, Llama-3-SauerkrautLM-8b-Instruct, Llama-3-Open-Ko-8B, llama-3-sqlcoder-8b, Meta-Llama-3-8B |
| Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models |
2024 |
Arxiv |
LLAMA 3.1 8B |
| What Matters for Model Merging at Scale? |
2024 |
Arxiv |
PaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B) |
| HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models |
2024 |
Arxiv |
Llama-2-7B-Chat, WizardMath-7B, CodeLlama-7B |
| SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging |
2024 |
Arxiv |
CodeLlama 7B |
| It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization |
2024 |
Arxiv |
Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B |
| Knowledge Fusion By Evolving Weights of Language Models |
2024 |
ACL |
|
| LLM Merging: Building LLMs Efficiently through Merging |
2024 |
NeurIPS 2024 Competition Track |
LLaMA-7B, Mistral-7B, Gemma-7B |
| Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement |
2024 |
Arxiv |
Qwen1.5-7B, Qwen1.5-Chat-7B, Sailor-7B, Qwen1.5-14B, Qwen1.5-Chat-14B, Sailor-14B, WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca |
| It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization |
2024 |
Arxiv |
Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B |
| MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic |
2024 |
Arxiv |
LLaMA-2-7B, Mistral-7B, LLaMA-2-13B |
| PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models |
2024 |
Arxiv |
Mistral-Instruct-7B, Mixtral-Instruct-8x7B |
| Knowledge fusion of large language models |
2024 |
ICLR |
Llama-2 7B, OpenLLaMA 7B, MPT 7B |
| Language models are super mario: Absorbing abilities from homologous models as a free lunch |
2024 |
ICML |
WizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B |
| Controlled Text Generation via Language Model Arithmetic |
2024 |
ICML |
MPT-7B, Pythia-12B, Llama-2-Chat-13B |
| MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models |
2024 |
Arxiv |
LlaMA2-13B and LlaMA3-8B (LoRA) |
| Evolutionary optimization of model merging recipes |
2024 |
Arxiv |
shisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B |
| Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM |
2024 |
Arxiv |
Llama-2 7B |
| Knowledge Fusion of Chat LLMs: A Preliminary Technical Report |
2024 |
Arxiv |
NH2-Mixtral-8x7B, NH2-Solar-10.7B, OpenChat-3.5-7B |