Skip to content

XiaofengLin7/debunking-sft-generalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

293 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Debunk the Myth of SFT Generalization

arXiv Paper

SFT can generalize as well as—or better than—RL when trained with the right data.

Installation

Prerequisites

CUDA 12.2 & cuDNN 9.1.0 works, but official docs recommends CUDA >= 12.4 & cuDNN >= 9.8.0.

Setup

conda create -n debunk_sft python=3.10
conda activate debunk_sft
USE_MEGATRON=0 bash setup.sh
git submodule init
git submodule update
pip install -e thirdparty/verl --no-dependencies
pip install -e thirdparty/ragen --no-dependencies
pip install -e thirdparty/alfworld --no-dependencies
pip install -e thirdparty/trl --no-dependecies

Getting Started

Dataset

Dataset collection

Task Method Diversity Format Link
Sokoban RL non-diverse 🤗
Sokoban RL diverse 🤗
Sokoban SFT non-diverse answer-only 🤗
Sokoban SFT diverse answer-only 🤗
Sokoban SFT non-diverse cot 🤗
Sokoban SFT diverse cot 🤗
General Points RL non-diverse 🤗
General Points RL diverse 🤗
General Points SFT non-diverse answer-only 🤗
General Points SFT diverse answer-only 🤗
General Points SFT non-diverse cot 🤗
General Points SFT diverse cot 🤗

Train your model with SFT

Specify your model and data beforhand. For sokoban

bash debunk_sft/scripts/sokoban/sokoban_train_and_eval.sh

For general points

bash debunk_sft/scripts/gp_l/gp_l_train_and_eval.sh

Train your model with GRPO

Specify your model and data beforhand. For sokoban

bash debunk_sft/scripts/sokoban/sokoban_grpo.sh

For gp

bash debunk_sft/scripts/gp_l/gp_l_grpo.sh

Citation

If you have an academic use, please cite

@article{lin2025debunk,
  title={Debunk the Myth of SFT Generalization},
  author={Lin, Xiaofeng and Sang, Hejian and Wang, Zhipeng and Zhang, Xuezhou},
  journal={arXiv preprint arXiv:2510.00237},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors