🚀 [2025/09/25] Recent Updates Summary for ROLL Project


llo everyone!  
Thank you for your interest in ROLL.
We continue to iterate and improve the ROLL project. Below is a summary of recent updates, categorized for your reference.

Highlights:
- ROLL aligned with the GEM environment definition and adds out-of-the-box, extensible [Agentic Tool Use training](https://github.com/alibaba/ROLL/blob/main/docs_roll/docs/English/UserGuide/agentic/Tool_Use.md).
- Support for [Qwen3-Next model training](https://github.com/alibaba/ROLL/blob/main/examples/qwen3-next-80BA3B-rlvr_megatron/rlvr_config.yaml).
- vLLM dynamic FP8 rollout support and remove_padding to improve training efficiency.
- Support for [SFT (Supervised Fine-Tuning) pipeline](https://github.com/alibaba/ROLL/blob/main/examples/qwen2.5-7B-sft_megatron/sft_config.yaml).
- Added support for the Wan2_2 Reward FL pipeline and [RL training for raw-image diffusion models](https://github.com/alibaba/ROLL/blob/main/docs_roll/docs/English/UserGuide/algorithms/Reward_FL.md).

New features in PR https://github.com/alibaba/ROLL/pull/172
- Agent / GEM / Tool Use
  - Aligned with the GEM environment definition and adjusted the env manager (gEm) to better support environment and tool interactions, improving customization flexibility.
  - Added Agentic Tool Use training examples and integrated ToolUse documentation.
  - Added step-wise reinforcement support (step reinforce) to enable new stepwise training capabilities.
  - Redundant environment capability: supports two-dimensional redundancy via num_env_groups and group_size to increase tolerance to environment failures.

- Models & backends
  - Added Qwen3-Next model implementation and training support (including fixes for checkpoint saving).
  - Support vLLM 0.10.2 and added dynamic FP8 rollout for vLLM.
  - Added support for multiple sglang versions (including sglang 0.5.2, 0.4.10.post2).
  - Provided a Dockerfile example for Torch 2.8 and updated mcore to 0.13.

- Pipelines & training algorithms
  - Added SFT pipeline support.
  - Added Wan2_2 reward FL pipeline support.
  - Support use_remove_padding in Megatron strategy for tail trimming optimization to improve performance.

Other features, bug fixes, and refinements
- Improved remove_padding support to reduce padding overhead.
- Added a roll debug flag to improve metric recording.
- Default strategy for llm judge reward worker switched from HF to vLLM to improve efficiency.
- Adjusted entropy loss computation to avoid unnecessary calculations.
- Changed default loss aggregation mode to seq-mean-token-mean (loss_agg_mode).
- Support passing is_lora when broadcasting parameters.
- Added include_stop_str_in_output, stop_strings and other stop-handling configuration options.
- Exposed environment metrics with aggregate_metrics control.
- Restructured agentic directories: merged roll/agentic into roll/pipeline/agentic to avoid split logic.
- Fixed webshop env state handling bug.
- Fixed vLLM version comparison logic, isolated cache roots for multiple vLLM actor_workers, and resolved vLLM compile conflicts.
- Fixed dataset load lock errors, math_env exceptions, and various other stability issues.
- Fixed ROLL hang in colocate mode on XPU.
- Fixed potential loss of environment variables when forwarding vLLM env vars to RayWorkerWrapper.
- Fixed convert script and qwen3next checkpoint saving.
- Fixed potential gradient loss caused by mask_mean/mask_sum handling dim=None.
- Deprecated: torch251 / vllm0.7.3 / sglang0.4.3 have been removed from the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 [2025/09/25] Recent Updates Summary for ROLL Project #173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🚀 [2025/09/25] Recent Updates Summary for ROLL Project #173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions