-
Stealth Startup
- Palo Alto, CA
-
22:04
(UTC -07:00) - http://drogozhang.github.io/
- @KaiZhang_CS
- in/kai-zhang-43774b196
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
[ACL'26 Findings] The Model Agreed, But Didn’t Learn: Diagnosing Surface Compliance in Large Language Models
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Code and data for the paper "Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation"
Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
[NeurIPS 2025] CamSAM2: Segment Anything Accurately in Camouflaged Videos
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
AeroSpace is an i3-like tiling window manager for macOS
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
Machine Learning and Computer Vision Engineer - Technical Interview Questions
🔥 A list of tools, frameworks, and resources for building AI web agents
Code for paper "Is Extending Modality The Right Path Towards Omni-Modality?"
[ICLR'26 Oral] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
A high-throughput and memory-efficient inference and serving engine for LLMs
🌎💪 BrowserGym, a Gym environment for web task automation
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
[TMLR'26] UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…
Pioneering Automated GUI Interaction with Native Agents
[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery



