Stars
An interface library for RL post training with environments.
Our library for RL environments + evals
Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
hamishivi / EasyLM
Forked from young-geng/EasyLMLarge language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
Learn the building blocks of how to build DeepSeek from scratch.
Fast C++ Pytorch extension for differentiable synthetic aperture radar image formation and autofocus library on CPU and GPU
Preempt-RT Kernel Build Guide for NVIDIA Development Board
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
NeXT hardware emulator for a NeXT Cube and NeXT Station. Mirrored from SourceForge
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Align Anything: Training All-modality Model with Feedback
Train transformer language models with reinforcement learning.
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
🌎💪 BrowserGym, a Gym environment for web task automation
A project that provides help for using DeepMind's mctx on gym-style environments.
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Recources to build the MFOS - Noise Toaster Synth by Ray Wilson
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
A library for advanced large language model reasoning
An extensible benchmark for evaluating large language models on planning
[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

