WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Technique adopted in AutoGLM, a series of Phone Use and Web Browser Use Foundation Agents

📃 Paper | 🤗 WebRL-GLM-4-9B | WebRL-LLaMA-3.1-8B

WebRL, a self-evolving online curriculum learning framework designed for training web agents, targeting the WebArena environment.

🚀 Quick Start

Dependencies

First, create a conda environment and install all pip package requirements.

conda create -n webrl python==3.10
conda activate webrl

cd WebRL
pip install -e .

Model checkpoint

The WebRL-GLM-4-9B checkpoint was released here and we use it:

We will also provide the checkpoint of ORM soon.

Train SFT model

We use LLaMA-Factory to train the SFT baseline, which is the starting model for WebRL. We release the code and data used for training. You can train the SFT baseline with the following commands:

cd LLaMA-Factory
bash run.sh examples/train_full/llama3_full_policy_web.yaml

Train WebRL

After training the SFT baseline, you should use it as the initial model of the actor and critic. You can train WebRL with the following commands:

bash run_multinode.sh

This command is used to train the actor and critic in each phase.

Generating New Instructions

You can generate new instructions with the following commands:

python scripts/gen_task.py

Interaction and Evaluation

TODO: The script for interaction with WebArena is based on VAB-WebArena-Lite, with specific modifications set to be published in this week.

Citation

@artical{qi2024webrl,
      title={WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning}, 
      author={Zehan Qi and Xiao Liu and Iat Long Iong and Hanyu Lai and Xueqiao Sun and Xinyue Yang and Jiadai Sun and Yu Yang and Shuntian Yao and Tianjie Zhang and Wei Xu and Jie Tang and Yuxiao Dong},
      journal={arXiv preprint arXiv:2411.02337},
      year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LLaMA-Factory		LLaMA-Factory
assets		assets
extras		extras
hparams		hparams
scripts		scripts
webrl.egg-info		webrl.egg-info
webrl		webrl
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
run_multinode.sh		run_multinode.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

🚀 Quick Start

Dependencies

Model checkpoint

Train SFT model

Train WebRL

Generating New Instructions

Interaction and Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

🚀 Quick Start

Dependencies

Model checkpoint

Train SFT model

Train WebRL

Generating New Instructions

Interaction and Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages