llo everyone!
Thank you for your interest in ROLL.
We continue to iterate and improve the ROLL project. Below is a summary of recent updates, categorized for your reference.
Highlights:
New features in PR #172
Other features, bug fixes, and refinements
- Improved remove_padding support to reduce padding overhead.
- Added a roll debug flag to improve metric recording.
- Default strategy for llm judge reward worker switched from HF to vLLM to improve efficiency.
- Adjusted entropy loss computation to avoid unnecessary calculations.
- Changed default loss aggregation mode to seq-mean-token-mean (loss_agg_mode).
- Support passing is_lora when broadcasting parameters.
- Added include_stop_str_in_output, stop_strings and other stop-handling configuration options.
- Exposed environment metrics with aggregate_metrics control.
- Restructured agentic directories: merged roll/agentic into roll/pipeline/agentic to avoid split logic.
- Fixed webshop env state handling bug.
- Fixed vLLM version comparison logic, isolated cache roots for multiple vLLM actor_workers, and resolved vLLM compile conflicts.
- Fixed dataset load lock errors, math_env exceptions, and various other stability issues.
- Fixed ROLL hang in colocate mode on XPU.
- Fixed potential loss of environment variables when forwarding vLLM env vars to RayWorkerWrapper.
- Fixed convert script and qwen3next checkpoint saving.
- Fixed potential gradient loss caused by mask_mean/mask_sum handling dim=None.
- Deprecated: torch251 / vllm0.7.3 / sglang0.4.3 have been removed from the repository.
llo everyone!
Thank you for your interest in ROLL.
We continue to iterate and improve the ROLL project. Below is a summary of recent updates, categorized for your reference.
Highlights:
New features in PR #172
Agent / GEM / Tool Use
Models & backends
Pipelines & training algorithms
Other features, bug fixes, and refinements