Debunk the Myth of SFT Generalization

SFT can generalize as well as—or better than—RL when trained with the right data.

Installation

Prerequisites

CUDA 12.2 & cuDNN 9.1.0 works, but official docs recommends CUDA >= 12.4 & cuDNN >= 9.8.0.

Setup

conda create -n debunk_sft python=3.10
conda activate debunk_sft
USE_MEGATRON=0 bash setup.sh
git submodule init
git submodule update
pip install -e thirdparty/verl --no-dependencies
pip install -e thirdparty/ragen --no-dependencies
pip install -e thirdparty/alfworld --no-dependencies
pip install -e thirdparty/trl --no-dependecies

Getting Started

Dataset

Dataset collection

Task	Method	Diversity	Format	Link
Sokoban	RL	non-diverse	—	🤗
Sokoban	RL	diverse	—	🤗
Sokoban	SFT	non-diverse	answer-only	🤗
Sokoban	SFT	diverse	answer-only	🤗
Sokoban	SFT	non-diverse	cot	🤗
Sokoban	SFT	diverse	cot	🤗
General Points	RL	non-diverse	—	🤗
General Points	RL	diverse	—	🤗
General Points	SFT	non-diverse	answer-only	🤗
General Points	SFT	diverse	answer-only	🤗
General Points	SFT	non-diverse	cot	🤗
General Points	SFT	diverse	cot	🤗

Train your model with SFT

Specify your model and data beforhand. For sokoban

bash debunk_sft/scripts/sokoban/sokoban_train_and_eval.sh

For general points

bash debunk_sft/scripts/gp_l/gp_l_train_and_eval.sh

Train your model with GRPO

Specify your model and data beforhand. For sokoban

bash debunk_sft/scripts/sokoban/sokoban_grpo.sh

For gp

bash debunk_sft/scripts/gp_l/gp_l_grpo.sh

Citation

If you have an academic use, please cite

@article{lin2025debunk,
  title={Debunk the Myth of SFT Generalization},
  author={Lin, Xiaofeng and Sang, Hejian and Wang, Zhipeng and Zhang, Xuezhou},
  journal={arXiv preprint arXiv:2510.00237},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 293 Commits
debunk_sft		debunk_sft
tests		tests
thirdparty		thirdparty
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
diversify_output.json		diversify_output.json
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Debunk the Myth of SFT Generalization

Installation

Prerequisites

Setup

Getting Started

Dataset

Train your model with SFT

Train your model with GRPO

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Debunk the Myth of SFT Generalization

Installation

Prerequisites

Setup

Getting Started

Dataset

Train your model with SFT

Train your model with GRPO

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages