Skip to content

thinkwee/AgentsMeetRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

137 Commits
 
 
 
 
 
 

Repository files navigation

NOVER Logo

Base Framework General Search & RAG Web & GUI
Tool Code & SWE Reasoning Multi-Agent
Memory Embodied Domain-Specific Reward & Training
Safety VLM Agent Self-Evolution Environment

Interactive Dashboard

When LLM Agents Meet Reinforcement Learning

AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:

  • 🤖 The criteria for identifying an agent project are that it must have at least one of the following: multi-turn interactions or tool use (so TIR projects, Tool-Integrated Reasoning, are considered in this repo).
  • ⚠️ This project is based on code analysis from open-source repositories using LLM coding agents, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!
  • 🚀 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. See [Click to view technical details] under each table.
  • 📅 Last updated: 2026-03-24
  • 🤗 Feel free to submit your own projects anytime - we welcome contributions!

Taxonomy:

  • Base Framework: General-purpose RL training frameworks for LLM agents (e.g., veRL, OpenRLHF, trl)
  • General/MultiTask: Agent systems trained/evaluated across multiple tasks or environments
  • Search & RAG: Search-augmented reasoning agents that use retrieval tools to enhance LLM reasoning
  • Web & GUI: Agents that interact with web browsers, mobile/desktop GUIs, or operating systems
  • Tool-Use: Agents trained to invoke external tools (APIs, code executors, MCP, etc.)
  • Code & SWE: Software engineering and code generation agents
  • Reasoning: Reasoning agents with tool-integrated or multi-turn reasoning (math, QA, visual)
  • Multi-Agent RL: Multi-agent collaboration, negotiation, or credit assignment via RL
  • Memory: Agents that learn to manage, retrieve, or evolve memory
  • Embodied: Agents operating in embodied/physical simulation environments
  • Domain-Specific: RL agents for specialized domains (medical, OS tuning, etc.)
  • Reward & Training: Process/outcome reward models and training methodologies for agents
  • Safety: RL for agent safety alignment, adversarial red-teaming, and jailbreak defense/attack
  • VLM Agent: Vision-language model agents trained with RL for multimodal interaction
  • Self-Evolution: Agents that self-evolve via RL feedback loops (⚠️ definition still evolving in the community)
  • Environment: Benchmarks, gyms, and sandbox environments for agent training/evaluation

Some Enumeration:

  • Enumeration for Reward Type:
    • External Verifier: e.g., a compiler or math solver
    • Rule-Based: e.g., a LaTeX parser with exact match scoring
    • Model-Based: e.g., a trained verifier LLM or reward LLM
    • Custom

Updates

  • 📢 2026-03 Update: Restructured taxonomy from 12 to 16 categories. Added ~70 new repositories covering Sep 2025 – Mar 2026. New categories include Multi-Agent RL, Reward & Training, Safety, VLM Agent, Self-Evolution, and Domain-Specific. Merged the old GUI and Web into Web & GUI, retired TextGame and Biomedical as standalone categories. Total repos grew from ~134 to 205.

🔧 Base Framework

Github Repo 🌟 Stars Date Org Paper Link
Open-AgentRL Stars 2026.2 Gen-Verse Paper
OpenClaw-RL Stars 2026.3 Gen-Verse Paper
Claw-R1 Stars 2026.3 USTC --
prime-rl Stars 2025.2 Prime Intellect --
NeMo-RL Stars 2026.1 NVIDIA --
RLinf Stars 2025.8 Tsinghua/Infinigence AI/PKU Paper
siiRL Stars 2025.7 Shanghai Innovation Institute Paper
slime 2025.6 Tsinghua University (THUDM) blog
agent-lightning Stars 2025.6 Microsoft Research Paper
AReaL Stars 2025.6 AntGroup/Tsinghua Paper
ROLL Stars 2025.6 Alibaba Paper
MARTI Stars 2025.5 Tsinghua --
RL2 Stars 2025.4 Accio
verifiers Stars 2025.3 Individual --
oat Stars 2024.11 NUS/Sea AI Paper
veRL Stars 2024.10 ByteDance Paper
OpenRLHF Stars 2023.7 OpenRLHF Paper
trl Stars 2019.11 HuggingFace --
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
Open-AgentRL GRPO-TCR Single Both Multi Reasoning/GUI/Coding Model (PRM) Yes (SandboxFusion)
OpenClaw-RL GRPO/OPD Both Both Multi Terminal/GUI/SWE/Tool-call Model/External Yes
Claw-R1 Generic RL Framework Multi Both Multi General Agent All Yes (Framework-agnostic)
prime-rl GRPO/PPO Multi Outcome Multi Math/Code/Search Model/External Yes
NeMo-RL GRPO/DAPO/GDPO/DPO Single Outcome Multi Math/Reasoning/Code Rule/External No
RLinf PPO/GRPO/DAPO/SAC/REINFORCE++/CrossQ/RLPD Both Both Multi Robotics/Math/Code/QA/VQA All (Rule/Model/External) Yes
siiRL PPO/GRPO/CPGD/MARFT Multi Both Multi LLM/VLM/LLM-MAS PostTraining Model/Rule Planned
slime GRPO/GSPO/REINFORCE++ Single Both Both Math/Code External Verifier Yes
agent-lightning PPO/Custom/Automatic Prompt Optimization Multi Outcome Multi Calculator/SQL Model/External/Rule Yes
AReaL PPO Both Outcome Both Math/Code External Yes
ROLL PPO/GRPO/Reinforce++/TOPR/RAFT++ Multi Both Multi Math/QA/Code/Alignment All Yes
MARTI PPO/GRPO/REINFORCE++/TTRL Multi Both Multi Math All Yes
RL2 Dr. GRPO/PPO/DPO Single Both Both QA/Dialogue Rule/Model/External Yes
verifiers GRPO Multi Outcome Both Reasoning/Math/Code All Code
oat PPO/GRPO Single Outcome Multi Math/Alignment External No
veRL PPO/GRPO Single Outcome Both Math/QA/Reasoning/Search All Yes
OpenRLHF PPO/REINFORCE++/GRPO/DPO/IPO/KTO/RLOO Multi Both Both Dialogue/Chat/Completion Rule/Model/External Yes
trl PPO/GRPO/DPO Single Both Single QA Custom No

💪 General/MultiTask

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MetaClaw Stars 2026.3 UNC-Chapel Hill (AIMING Lab) Paper Custom
SkillRL Stars 2026.2 UNC-Chapel Hill (AIMING Lab) Paper Custom
LLM-in-Sandbox Stars 2026.1 RUC/MSRA/THU Paper rllm (w/ veRL)
youtu-agent Stars 2025.12 Tencent Youtu Lab Paper Custom
DEPO Stars 2025.11 HKUST/SJTU Paper LLaMA-Factory
SPEAR Stars 2025.10 Tencent Youtu Lab Paper veRL/verl-agent
DeepAgent Stars 2025.10 RUC/Xiaohongshu Paper Custom
AgentRL Stars 2025.9 Tsinghua Paper veRL
AgentGym-RL Stars 2025.9 Fudan University Paper veRL
Agent_Foundation_Models Stars 2025.8 OPPO Personal AI Lab Paper veRL
Trinity-RFT Stars 2025.5 Alibaba Paper veRL
SPA-RL-Agent Stars 2025.5 PolyU Paper TRL
verl-agent Stars 2025.5 NTU/Skywork Paper veRL
VAGEN Stars 2025.3 RAGEN-AI Paper veRL
ART Stars 2025.3 OpenPipe Paper TRL
OpenManus-RL Stars 2025.3 UIUC/MetaGPT -- Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MetaClaw GRPO (LoRA) Single Process Multi General Agentic Model (PRM) Yes (Skill-augmented)
SkillRL GRPO Single Outcome Multi ALFWorld/WebShop/Search Rule Yes (Web search, actions)
LLM-in-Sandbox GRPO++ Single Outcome Multi Math/Physics/Chemistry/Biomedicine/Long-context/IF/SWE Rule Yes (Code Sandbox w/ Terminal, File, Internet)
youtu-agent Training-Free GRPO Single Outcome Multi Deep Research/Data Analysis/Tool-use Model/External Yes (Web search, code, file)
DEPO KTO + Efficiency Loss Single Both Multi Agent (BabyAI/WebShop) Rule Yes
SPEAR GRPO/GiGPO + SIL Single Both Multi Math/Agent Rule/External Yes (Search, Sandbox, Browser)
DeepAgent ToolPO Single Outcome Multi ToolBench/ALFWorld/WebShop/GAIA/HLE Model Yes (16,000+ RapidAPIs)
AgentRL GRPO/REINFORCE++/RLOO/ReMax/GAE Single Outcome Multi Agent Tasks External Yes
AgentGym-RL PPO/GRPO/RLOO/REINFORCE++ Single Outcome Multi Web/Search/Game/Embodied/Science Rule/Model/External Yes (Web, Search, Env APIs)
Agent_Foundation_Models DAPO/PPO Single Outcome Single QA/Code/Math Rule/External Yes
Trinity-RFT PPO/GRPO Single Outcome Both Math/TextGame/Web All Yes
SPA-RL-Agent PPO Single Process Multi Navigation/Web/TextGame Model No
verl-agent PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++ Multi Both Multi Phone Use/Math/Code/Web/TextGame All Yes
VAGEN PPO/GRPO Single Both Multi TextGame/Navigation All Yes
ART GRPO Multi Both Multi TextGame All Yes
OpenManus-RL PPO/DPO/GRPO Multi Outcome Multi TextGame All Yes

🔍 Search & RAG Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
ProRAG Stars 2026.1 RUC Paper Custom
MemSearcher Stars 2025.11 CAS Paper Custom
ReSeek Stars 2025.10 Tencent PCG BAC/Tsinghua University Paper veRL
AutoGraph-R1 Stars 2025.10 HKUST KnowComp Paper Custom
Tree-GRPO Stars 2025.9 AMAP Paper veRL
ASearcher Stars 2025.8 Ant Research RL Lab
Tsinghua University & UW
Paper RealHF/AReaL
Graph-R1 Stars 2025.7 BUPT/NTU/NUS Paper veRL
Kimi-Researcher Stars 2025.6 Moonshot AI blog Custom
R-Search Stars 2025.6 Individual -- veRL
R1-Searcher-plus Stars 2025.5 RUC Paper Custom
StepSearch Stars 2025.5 SenseTime Paper veRL
AutoRefine Stars 2025.5 USTC Paper veRL
ZeroSearch Stars 2025.5 Alibaba Paper veRL
ReasonRAG Stars 2025.5 CityU HK / Huawei Paper Custom
Agentic-RAG-R1 Stars 2025.12 PKU -- Custom
WebThinker Stars 2025.4 RUC Paper Custom
DeepResearcher Stars 2025.4 SJTU Paper veRL
Search-R1 Stars 2025.3 UIUC/Google paper1, paper2 veRL
R1-Searcher Stars 2025.3 RUC Paper OpenRLHF
C-3PO Stars 2025.2 Alibaba Paper OpenRLHF
DeepRetrieval Stars 2025.2 UIUC Paper veRL
SSRL Stars 2025.8 Tsinghua Paper Custom
Research-Venus Stars 2025.8 Ant Group Paper Custom
DeepResearch Stars 2025.9 Alibaba/Tongyi Lab Paper Custom
DeepDive Stars 2025.9 Tsinghua/THUDM Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ProRAG GRPO + DGA (dual-granularity advantage) Single Both Multi Multi-hop RAG Model (PRM via MCTS) Yes (Retrieval)
MemSearcher Multi-context GRPO Single Outcome Multi Search/QA + Memory Rule/Model Yes (Web search + Memory)
ReSeek GRPO/PPO Single Both Multi QA/Search Rule Search/JUDGE
AutoGraph-R1 GRPO (via VeRL) Single Outcome Multi KG Construction for QA Rule Yes (Graph retrieval)
Tree-GRPO GRPO/Tree-GRPO Single Outcome Multi Search Rule Search
ASearcher PPO/GRPO + Decoupled PPO Single Outcome Multi Math/Code/SearchQA External/Rule Yes
Graph-R1 GRPO/REINFORCE++/PPO Single Outcome Multi KGQA Rule (EM/F1) Yes (Graph retrieval)
Kimi-Researcher REINFORCE Single Outcome Multi Research Outcome Search, Browse, Coding
R-Search PPO/GRPO Single Both Multi QA/Search All Yes
R1-Searcher-plus Custom Single Outcome Multi Search Model Search
StepSearch PPO Single Process Multi QA Model Search
AutoRefine PPO/GRPO Multi Both Multi RAG QA Rule Search
ZeroSearch PPO/GRPO/REINFORCE Single Outcome Multi QA/Search Rule Yes
ReasonRAG DPO + MCTS-based PRM Single Process Multi Multi-hop QA Model (PRM) Yes (Wikipedia search)
Agentic-RAG-R1 GRPO Single Outcome Multi Knowledge-intensive QA Rule/Model Yes (Wiki/Doc search)
WebThinker DPO Single Outcome Multi Reasoning/QA/Research Model/External Web Browsing
DeepResearcher PPO/GRPO Multi Outcome Multi Research All Yes
Search-R1 PPO/GRPO Single Outcome Multi Search All Search
R1-Searcher PPO/DPO Single Both Multi Search All Yes
C-3PO PPO Multi Outcome Multi Search Model Yes
DeepRetrieval GRPO Single Outcome Multi Query Generation/IR Rule Yes (Search)
SSRL GRPO Single Outcome Multi Self-Search Rule Yes (Self-search)
Research-Venus GRPO Single Both Multi Deep Research Model (atomic thought) Yes (Search)
DeepResearch RL-based Single Outcome Multi Deep Research Model Yes (Search, Browse)
DeepDive GRPO Single Outcome Multi KG-augmented Search Rule Yes (KG + Search)

🌐 Web & GUI Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MobileAgent Stars 2025.9 X-PLUG (TongyiQwen) paper veRL
InfiGUI-G1 Stars 2025.8 InfiX AI Paper veRL
UI-AGILE Stars 2025.7 Xiamen University Paper Custom
gui-rcpo Stars 2025.8 Zhejiang University Paper Custom
Grounding-R1 Stars 2025.6 Salesforce blog trl
AgentCPM-GUI Stars 2025.6 OpenBMB/Tsinghua/RUC Paper Huggingface
TTI Stars 2025.6 CMU Paper Custom
SE-GUI Stars 2025.5 Nankai University/vivo Paper trl
ARPO Stars 2025.5 CUHK/HKUST Paper veRL
GUI-G1 Stars 2025.5 RUC Paper TRL
WebAgent-R1 Stars 2025.5 Amazon/UVA Paper Custom
GUI-R1 Stars 2025.4 CAS/NUS Paper veRL
UI-R1 Stars 2025.3 vivo/CUHK Paper TRL
CollabUIAgents Stars 2025.2 Tsinghua/Alibaba/HKUST Paper Custom
WebAgent Stars 2025.1 Alibaba paper1, paper2 LLaMA-Factory
UI-TARS Stars 2025.9 ByteDance Seed Paper Custom
DigiQ Stars 2025.2 UC Berkeley/CMU/Amazon Paper Custom
ZeroGUI Stars 2025.5 Shanghai AI Lab Paper Custom
InfiGUI-R1 Stars 2025.4 Zhejiang University Paper Custom
GUI-Agent-RL Stars 2025.2 Microsoft Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MobileAgent semi-online RL Single Both Multi MobileGUI/Automation Rule Yes
InfiGUI-G1 AEPO Single Outcome Single GUI/Grounding Rule No
UI-AGILE GRPO Single Outcome Single GUI Grounding Rule (continuous) No
gui-rcpo RCPO Single Outcome Single GUI Grounding Rule (self-supervised) No
Grounding-R1 GRPO Single Outcome Multi GUI Grounding Model Yes
AgentCPM-GUI GRPO Single Outcome Multi Mobile GUI Model Yes
TTI REINFORCE/BC Single Outcome Multi Web External Web Browsing
SE-GUI GRPO Single Both Single GUI Grounding Rule Yes
ARPO GRPO Single Outcome Multi GUI External Computer Use
GUI-G1 GRPO Single Outcome Single GUI Rule/External No
WebAgent-R1 M-GRPO Single Outcome Multi Web Navigation (WebArena-Lite) Rule (task success) Yes (Web browsing)
GUI-R1 GRPO Single Outcome Multi GUI Rule No
UI-R1 GRPO Single Process Both GUI Rule Computer/Phone Use
CollabUIAgents DPO (credit re-assignment) Multi Process Multi GUI (Mobile + Web) Model (LLM) Yes (GUI interaction)
WebAgent DAPO Multi Process Multi Web Model Yes
UI-TARS Multi-turn RL Single Both Multi GUI (Cross-platform) Model Yes (GUI actions)
DigiQ Value-based offline RL Single Outcome Multi Android Device Control Model (Q-function) Yes
ZeroGUI Online RL Single Outcome Multi GUI Agent Rule Yes (GUI actions)
InfiGUI-R1 RL + sub-goal guidance Single Both Multi GUI Reasoning Rule Yes
GUI-Agent-RL Value-based RL (VEM) Single Outcome Multi GUI (Web Shopping) Model Yes

🔨 Tool-Use Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MATPO Stars 2025.10 MiroMind AI Paper Custom
MiroRL Stars 2025.8 MiroMindAI HF Repo veRL
verl-tool Stars 2025.6 TIGER-Lab X veRL
Multi-Turn-RL-Agent Stars 2025.5 University of Minnesota Paper Custom
Tool-N1 Stars 2025.5 NVIDIA Paper veRL
Tool-Star Stars 2025.5 RUC Paper LLaMA-Factory
RL-Factory Stars 2025.5 Simple-Efficient model veRL
ReTool Stars 2025.4 ByteDance Paper veRL
AWorld Stars 2025.3 Ant Group (inclusionAI) Paper veRL
Agent-R1 Stars 2025.3 USTC Paper veRL
ReCall Stars 2025.3 BaiChuan Paper veRL
ToolRL Stars 2025.4 UIUC Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MATPO GRPO (multi-agent) Multi Outcome Multi Tool-use/Search Rule Yes (MCP: Serper, Web scraping)
MiroRL GRPO Single Both Multi Reasoning/Planning/ToolUse Rule-based MCP
verl-tool PPO/GRPO Single Both Both Math/Code Rule/External Yes
Multi-Turn-RL-Agent GRPO Single Both Multi Tool-use/Math Rule/External Yes
Tool-N1 PPO Single Outcome Multi Math/Dialogue All Yes
Tool-Star PPO/DPO/ORPO/SimPO/KTO Single Outcome Multi Multi-modal/Tool Use/Dialogue Model/External Yes
RL-Factory GRPO Multi Both Multi Tool-use/NL2SQL All MCP
ReTool PPO Single Outcome Multi Math External Code
AWorld GRPO Both Outcome Multi Search/Web/Code External/Rule Yes
Agent-R1 PPO/GRPO Single Both Multi Tool-use/QA Model Yes
ReCall PPO/GRPO/RLOO/REINFORCE++/ReMax Single Outcome Multi Tool-use/Math/QA All Yes
ToolRL GRPO/PPO Single Outcome Multi Tool Learning Rule/External Yes

💻 Code & SWE Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
CUDA-Agent Stars 2026.2 ByteDance/Tsinghua Paper Custom
LLM-in-Sandbox Stars 2026.1 RUC/MSRA/THU Paper rllm (w/ veRL)
PPP-Agent Stars 2025.11 CMU/OpenHands Paper veRL
RepoDeepSearch Stars 2025.8 PKU, Bytedance, BIT Paper veRL
CUDA-L1 Stars 2025.7 DeepReinforce AI Paper Custom
MedAgentGym Stars 2025.6 Emory/Georgia Tech Paper Hugginface
CURE Stars 2025.6 University of Chicago
Princeton/ByteDance
Paper Huggingface
Time-R1 Stars 2025.5 UIUC Paper veRL
ML-Agent Stars 2025.5 MASWorks Paper Custom
SkyRL Stars 2025.4 NovaSky Paper veRL
digitalhuman Stars 2025.4 Tencent Paper veRL
sweet_rl Stars 2025.3 Meta/UCB Paper OpenRLHF
swe-rl Stars 2025.2 Meta/UIUC/CMU Paper Custom
rllm Stars 2025.1 Berkeley Sky Computing Lab
BAIR / Together AI
Notion Blog veRL
open-r1 Stars 2025.1 HuggingFace -- TRL
R1-Code-Interpreter Stars 2025.5 MIT Paper Custom
CTRL Stars 2025.2 HKU/ByteDance Paper Custom
DeepAnalyze Stars 2025.10 RUC/Tsinghua Paper Custom
AceCoder Stars 2025.2 Waterloo (TIGER-Lab) Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
CUDA-Agent Agentic RL (staged) Single Outcome Multi CUDA Kernel Generation Rule (correctness + performance) Yes (compile/verify/profile)
LLM-in-Sandbox GRPO++ Single Outcome Multi Code/SWE + General (Math/Sci/Bio) Rule Yes (Code Sandbox w/ Terminal, File, Internet)
PPP-Agent PPP-RL Single Both Multi SWE/Research Rule+Model Search, Ask, Browse
RepoDeepSearch GRPO Single Both Multi Search/Repair Rule/External Yes
CUDA-L1 Contrastive RL Single Outcome Single CUDA Optimization Rule (performance) No
MedAgentGym SFT/DPO/PPO/GRPO Single Outcome Multi Medical/Code External Yes
CURE PPO Single Outcome Single Code External No
Time-R1 PPO/GRPO/DPO Multi Outcome Multi Temporal All Code
ML-Agent Custom Single Process Multi Code All Yes
SkyRL PPO/GRPO Single Outcome Multi Math/Code All Code
digitalhuman PPO/GRPO/ReMax/RLOO Multi Outcome Multi Empathy/Math/Code/MultimodalQA Rule/Model/External Yes
sweet_rl DPO Multi Process Multi Design/Code Model Web Browsing
swe-rl RL-based Single Outcome Single SWE (SWE-bench) Rule (similarity) No
rllm PPO/GRPO Single Outcome Multi Code Edit External Yes
open-r1 GRPO Single Outcome Single Math/Code All Yes
R1-Code-Interpreter GRPO Single Outcome Multi Code Interpretation Rule/External Yes (Code exec)
CTRL RL (critique-revision) Single Process Multi Code Refinement Model Yes (Code exec)
DeepAnalyze Curriculum RL Single Outcome Multi Data Science Rule/External Yes (Code exec)
AceCoder GRPO Single Outcome Single Code Generation External (test cases) Yes

🤔 Reasoning Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
Agent0 Stars 2025.10 UNC‑Chapel Hill / Salesforce Research / Stanford University Paper veRL
KG-R1 Stars 2025.9 UIUC/Google Paper1, Paper2 veRL
AgentFlow Stars 2025.09 Stanford University arXiv veRL
ARPO Stars 2025.7 RUC, Kuaishou Paper veRL
terminal-bench-rl Stars 2025.7 Individual (Danau5tin) N/A rLLM
MOTIF Stars 2025.6 University of Maryland Paper trl
cmriat/l0 Stars 2025.6 CMRIAT Paper veRL
agent-distillation Stars 2025.5 KAIST Paper Custom
EasyR1 Stars 2025.4 Individual repo1/paper2 veRL
AutoCoA Stars 2025.3 BJTU Paper veRL
ToRL Stars 2025.3 SJTU Paper veRL
ReMA Stars 2025.3 SJTU, UCL Paper veRL
Agentic-Reasoning Stars 2025.2 Oxford Paper Custom
SimpleTIR Stars 2025.2 NTU, Bytedance Notion Blog veRL
openrlhf_async_pipline Stars 2024.5 OpenRLHF Paper OpenRLHF
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
Agent0 ADPO Multi Process Multi Math/Visual Model/Verifier Yes
KG-R1 GRPO/PPO Single Both Multi KGQA Rule/Model KG Retrieval
AgentFlow Flow-GRPO Single Outcome Multi Search/Math/QA Model/External Yes
ARPO GRPO Single Outcome Multi Math/Coding Model/Rule Yes
terminal-bench-rl GRPO Single Outcome Multi Coding/Terminal Model+External Verifier Yes
MOTIF GRPO Single Outcome Multi QA Rule No
cmriat/l0 PPO Multi Process Multi QA All Yes
agent-distillation PPO Single Process Multi QA/Math External Yes
EasyR1 GRPO Single Process Multi Vision-Language Model Yes
AutoCoA GRPO Multi Outcome Multi Reasoning/Math/QA All Yes
ToRL GRPO Single Outcome Single Math Rule/External Yes
ReMA PPO Multi Outcome Multi Math Rule No
Agentic-Reasoning Custom Single Process Multi QA/Math External Web Browsing
SimpleTIR PPO/GRPO (with extensions) Single Outcome Multi Math, Coding All Yes
openrlhf_async_pipline PPO/REINFORCE++/DPO/RLOO Single Outcome Multi Dialogue/Reasoning/QA All No

👥 Multi-Agent RL

Github Repo 🌟 Stars Date Org Paper Link RL Framework
PettingLLMs Stars 2025.10 Intel / UCSD Paper Custom
MASPRM Stars 2025.10 UBC / Huawei Paper Custom
ARIA Stars 2025.6 Fudan University Paper Custom
AMPO Stars 2025.5 Tongyi Lab, Alibaba Paper veRL
MAPoRL Stars 2025.8 Academic -- Custom
FlowReasoner Stars 2025.4 Sea AI Lab / NUS Paper Custom
DrMAS Stars 2026.2 NTU Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
PettingLLMs AT-GRPO Multi Both Multi Game/Code/Math/Planning Rule (verifiable) No
MASPRM PRM (trained from MCTS rollouts) Multi Process Multi Reasoning (GSM8K/MATH/MMLU) Learned PRM No
ARIA REINFORCE Both Process Multi Negotiation/Bargaining Other No
AMPO BC/AMPO(GRPO improvement) Multi Outcome Multi Social Interaction Model-based No
MAPoRL PPO Multi Outcome Multi Collaborative LLM Tasks Rule No
FlowReasoner GRPO Multi Outcome Multi Multi-agent Workflow Design Rule Yes
DrMAS GRPO (agent-wise) Multi Outcome Multi Multi-agent LLM Systems Rule No

🧠 Memory

Github Repo 🌟 Stars Date Org Paper Link RL Framework
MEM1 Stars 2025.7 MIT Paper veRL (based on Search-R1)
Memento Stars 2025.6 UCL, Huawei Paper Custom
MemAgent Stars 2025.6 Bytedance, Tsinghua-SIA Paper veRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MEM1 PPO/GRPO Single Outcome Multi WebShop/GSM8K/QA Rule/Model Yes
Memento soft Q-Learning Single Outcome Multi Research/QA/Code/Web External/Rule Yes
MemAgent PPO, GRPO, DPO Multi Outcome Multi Long-context QA Rule/Model/External Yes

🦾 Embodied

Github Repo 🌟 Stars Date Org Paper Link RL Framework
Embodied-R1 Stars 2025.6 Tianjing University Paper veRL
STeCa Stars 2025.2 The Hong Kong Polytechnic University Paper FastChat/TRL
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
Embodied-R1 GRPO Single Outcome Single Grounding/Waypoint Rule No
STeCa DPO (RFT) Single Both Multi Embodied/Household Rule/MC Environment Actions

🏷️ Domain-Specific

Github Repo 🌟 Stars Date Org Paper Link RL Framework Domain
MedSAM-Agent Stars 2026.2 CUHK/Tencent Paper Custom Medical
OS-R1 Stars 2025.8 ISCAS Paper Custom OS/Systems
MMedAgent-RL Stars 2025.8 Unknown paper Unknown Medical
DoctorAgent-RL Stars 2025.5 UCAS/CAS/USTC Paper RAGEN Medical
Biomni Stars 2025.3 Stanford University (SNAP) Paper Custom Biomedical
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
MedSAM-Agent GRPO (via veRL) Single Both Multi Medical Image Segmentation Model (clinical fidelity) Yes (SAM/MedSAM2)
OS-R1 GRPO (via veRL) Single Outcome Multi Linux Kernel Tuning Rule Yes (LightRAG, kernel config)
MMedAgent-RL Unknown Multi Unknown Unknown Unknown Unknown Unknown
DoctorAgent-RL GRPO Multi Both Multi Consultation/Diagnosis Model/Rule No
Biomni TBD Single TBD Single scRNAseq/CRISPR/ADMET/Knowledge TBD Yes

🎯 Reward & Training Methodology

Github Repo 🌟 Stars Date Org Paper Link Focus
ToolPRMBench Stars 2026.1 Arizona State University Paper PRM Benchmark for Tool-Use
RLVR-World Stars 2025.5 THU ML Group Paper RLVR for World Models
AgentPRM Stars 2025.2 Cornell Paper Process Reward for Agents
Agentic-Reward-Modeling Stars 2025.2 THU-KEG Paper Agentic Reward Agent
AgentRM Stars 2025.2 THUNLP/Tsinghua Paper Generalizable Agent RM
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
ToolPRMBench N/A (Benchmark) Single Process Multi Tool-Use Rule/Model Yes
RLVR-World RLVR Single Outcome Multi World Modeling (Language/Video) Model (verifiable) No
AgentPRM PPO/DPO + PRM Single Process Multi ALFWorld/General Model (PRM) Yes
Agentic-Reward-Modeling DPO/Best-of-N Single Outcome Single General Instruction Model (Reward Agent) Yes (Verification)
AgentRM MCTS/RM-guided Single Outcome Multi 9 Agent Tasks Model (regression PRM) Yes

🛡️ Safety

Github Repo 🌟 Stars Date Org Paper Link RL Framework
SafeSearch Stars 2025.11 Amazon Science Paper veRL
curiosity_redteam Stars 2024.2 MIT Paper Custom
RLbreaker Stars 2024.6 Purdue Paper Custom
xJailbreak Stars 2025.1 Academic Paper Custom
Auto-RT Stars 2025.1 ICIP-CAS Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
SafeSearch PPO (GAE/GRPO) Single Both Multi Safe QA/Search Rule + Model Search
curiosity_redteam RL + Curiosity Single Outcome Multi Red Teaming Model Yes (iterative query)
RLbreaker Custom PPO Single Outcome Multi Jailbreaking Model Yes (mutator selection)
xJailbreak RL Single Outcome Multi Jailbreaking Model (embedding) Yes (iterative)
Auto-RT PPO Single Outcome Multi Red Teaming Model Yes (strategy exploration)

👁️ VLM Agent

Github Repo 🌟 Stars Date Org Paper Link RL Framework
multimodal-search-r1 Stars 2025.6 ByteDance/NTU Paper Custom
DeepEyesV2 Stars 2025.11 Xiaohongshu Paper Custom
VDeepEyes Stars 2025.5 Xiaohongshu/XJTU Paper veRL
CoSo Stars 2025.5 NTU/Alibaba Paper Custom
RL4VLM Stars 2024.5 UC Berkeley Paper Custom
VSC-RL Stars 2025.2 Liverpool/Huawei/Tianjin/UCL Paper Custom
AlphaDrive Stars 2025.3 HUST/Horizon Robotics Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
multimodal-search-r1 GRPO Single Outcome Multi Multimodal Search Rule Yes (Search)
DeepEyesV2 Outcome RL Single Outcome Multi Multimodal Reasoning Rule Yes (Code exec, Web search)
VDeepEyes PPO/GRPO Multi Process Multi VQA All Yes
CoSo Soft RL (counterfactual) Single Outcome Multi Android/Card/Embodied Rule Yes
RL4VLM PPO Single Outcome Multi GymCards/ALFWorld Rule Yes
VSC-RL Variational RL Single Outcome Multi Mobile Device Control Rule Yes
AlphaDrive GRPO Single Outcome Multi Autonomous Driving Rule (4 planning rewards) No

🔄 Self-Evolution

⚠️ Note: The definition of "Self-Evolution" in the context of RL for LLM agents is still evolving and not yet well-established. This category currently collects works whose paper titles explicitly contain "self-evolving" or "self-evolution", where the agent improves itself through RL-driven feedback loops.

Github Repo 🌟 Stars Date Org Paper Link RL Framework
AgentEvolver Stars 2025.11 Alibaba/Tongyi Lab Paper Custom
SEAgent Stars 2025.8 Shanghai AI Lab / CUHK Paper Custom
MemSkill Stars 2026.2 NTU/UIUC/UIC/Tsinghua Paper Custom
MemRL Stars 2026.1 SJTU/Xidian/NUS/USTC/MemTensor Paper Custom
RAGEN Stars 2025.1 RAGEN-AI Paper veRL
WebRL Stars 2024.11 Tsinghua/Zhipu AI Paper Custom
📋 Click to view technical details
Github Repo RL Algorithm Single/Multi Agent Outcome/Process Reward Single/Multi Turn Task Reward Type Tool usage
AgentEvolver ADCA-GRPO Single Outcome Multi Social Game/Tool-use Rule Yes
SEAgent GRPO Single Outcome Multi Computer Use (OSWorld) Model Yes (Screenshot-based)
MemSkill PPO Single Process Multi QA/ALFWorld Model (learned skills) Yes
MemRL RL-based (Q-value) Single Process Multi HLE/BigCodeBench/ALFWorld Model (retrieval) Yes
RAGEN PPO/GRPO (StarPO) Single Both Multi TextGame All Yes
WebRL Actor-Critic RL + ORM Single Outcome Multi Web Navigation (WebArena) Model (ORM) Yes (Web browsing)

⛰️ Environment

Github Repo 🌟 Stars Date Org Task
OpenSandbox Stars 2026.3 Alibaba Code/GUI/Agent Eval
OpenEnv Stars 2026.3 Meta (PyTorch) Chess/Arcade/Finance
NeMo-Gym Stars 2026.1 NVIDIA Multi-step/Multi-turn
open-trajectory-gym Stars 2026.3 Individual CTF/Security
R2E-Gym Stars 2025.4 UC Berkeley/ANU SWE
LoCoBench-Agent 2025.11 Salesforce AI Research SWE
Simia-Agent-Training 2025.10 Microsoft ToolUse/API
PaperArena Stars 2025.9 University of Science and Technology of China ScientificLiteratureQA
enterprise-deep-research 2025.9 Salesforce AI Research DeepResearch
CompassVerifier Stars 2025.7 Shanghai AI Lab Reasoning
SWE-smith Stars 2025.4 Princeton/Stanford/SWE-bench SWE
SWE-Gym Stars 2024.12 UC Berkeley/UIUC/CMU/Apple SWE
Mind2Web-2 Stars 2025.6 Ohio State University Web
gem Stars 2025.5 Sea AI Lab Math/Code/Game/QA
MLE-Dojo Stars 2025.5 GIT, Stanford MLE
atropos Stars 2025.4 Nous Research Game/Code/Tool
InternBootcamp Stars 2025.4 InternBootcamp Coding/QA/Game
loong Stars 2025.3 CAMEL-AI.org RLVR
DataSciBench Stars 2025.2 Tsinghua data analysis
reasoning-gym Stars 2025.1 open-thought Math/Game
llmgym Stars 2025.1 tensorzero TextGame/Tool
debug-gym Stars 2024.11 Microsoft Research Debugging/Game/Code
gym-llm Stars 2024.8 Rodrigo Sánchez Molina Control/Game
AgentGym Stars 2024.6 Fudan Web/Game
tau-bench Stars 2024.6 Sierra Tool
appworld Stars 2024.6 Stony Brook University Phone Use
android_world Stars 2024.5 Google Research Phone Use
TheAgentCompany Stars 2024.3 CMU, Duke Coding
LlamaGym Stars 2024.3 Rohan Pandey Game
visualwebarena Stars 2024.1 CMU Web
LMRL-Gym Stars 2023.12 UC Berkeley Game
OSWorld Stars 2023.10 HKU, CMU, Salesforce, Waterloo Computer Use
webarena Stars 2023.7 CMU Web
AgentBench Stars 2023.7 Tsinghua University Game/Web/QA/Tool
WebShop Stars 2022.7 Princeton-NLP Web
ScienceWorld Stars 2022.3 AllenAI TextGame/ScienceQA
alfworld Stars 2020.10 Microsoft, CMU, UW Embodied
factorio-learning-environment Stars 2021.6 JackHopkins Game
jericho Stars 2018.10 Microsoft, GIT TextGame
TextWorld Stars 2018.6 Microsoft Research TextGame

Under Review/Waiting for Open Source

Star History

Star History Chart

Citation

If you find this repository useful, please consider citing it:

@misc{agentsMeetRL,
  title={When LLM Agents Meet Reinforcement Learning: A Comprehensive Survey},
  author={AgentsMeetRL Contributors},
  year={2025},
  url={https://github.com/thinkwee/agentsMeetRL}
}

Made with ❤️ by the AgentsMeetRL community

Releases

No releases published

Packages

 
 
 

Contributors

Languages