Cursor、Claude Code、Codex、Hermes agent 4 家 harness 对比

Cursor

IDE Agent Runtime

Codex

OpenAI CLI Agent

Hermes

Nous Research

Claude Code

Anthropic

我拆开看过 Cursor、Claude Code、Codex、Hermes agent 四家编程助手的底层。壳子不同，骨架却像：外面管界面和权限，里面一个 while 在模型和工具之间转，脏活丢给子任务，子任务结束只把摘要还给主会话。下面先记四家的差别，再写灵犀（Lumina）的 PRD 和实现怎么接上这些想法。

1. 主循环

Cursor 绑在 IDE。主会话反复「问模型、调工具」，派活用 Task 和 /multitask，进度在侧边栏。要多文件同时改仓库时，worktree 用得最多。

Codex 在 Rust 里用 SessionTask 组织一轮轮 turn：先采样，再执行工具，写回上下文，然后继续采样。Thread、Turn、Item 分得细，和客户端走 JSON-RPC；长任务可以 pause，等人批准再继续。

Hermes 主要靠一个 Python AIAgent 里的 run_conversation()，同步 while 循环。delegate_task 在工具层被拦下，用来起子 agent。CLI 和 Gateway 共用这一套。

Claude Code 表面是 runAgent()，真正转圈的是内部的 query()。子 agent 从 Agent 工具进；子工具集里故意不带 Agent，免得一层层套下去。

上下文长了都会顶不住。Cursor 让 Explore 这类子 agent 吃掉检索噪声；Codex 在 turn 里做 compaction；Hermes 压 session 并进 DB；Claude Code 靠长窗加子 agent 分流。办法不一，都是别让主窗口塞满 tool 日志。

文档：Cursor Subagents、Codex agent loop、Hermes delegation、Claude Code subagents

2. 子任务与权限

派子任务的入口各家名字不同，习惯差不多：父会话丢一段目标进去，子会话自己跑，父会话只看汇总，中间 tool 轨迹不进父 context。

子 agent 看不到父聊天全文，最多收到 prompt。父会话也看不到子 agent 的逐步日志。Claude Code 的 fork 能继承父 context，算少数例外。

权限上，Claude Code 最严，子 agent 不能再 spawn。Hermes 的 leaf 默认不带 delegate。Codex 常在危险步骤 pause，等客户端确认。Cursor 跟着 IDE 的 hooks 和确认框走。

并行改同一 repo 时，Cursor 的 worktree 仍最好用。要协议完整、长任务可暂停，Codex 值得对着学。Python 栈想一眼看懂 loop，Hermes 最直接。

3. 对照表

维度	Cursor	Codex	Hermes	Claude Code
主循环	IDE 内 model↔tools	Rust Session turn	Python while	`query()`
派子任务	Task	AgentControl	delegate_task	Agent tool
父只看摘要	是	是	是	是
子再 spawn	受限	有 depth 上限	leaf 默认否	硬禁
见长	并行改 repo	暂停与协议	实现简单	类型与边界

4. 灵犀（Lumina）PRD

这是我做灵犀时整理的产品结构，代码在 GitHub。对照上一节四家 harness，看 PRD 里每一格落在哪。

4.1 背景与目标

要解决的问题：飞书、微信读书、小红书、邮件、云盘等信息散在各处，云端助手碰不到这些数据，隐私也难自己说了算。

产品目标：在本机跑一个个人秘书，把个人数据源接进来；对话能调工具；记忆能跨会话留下；新能力用 MCP 插件扩展。

v1 不做：云端多租户、内置 IDE、无确认的自动写盘。

4.2 用户场景

场景	用户要什么	产品怎么应
日常问答	结合「我的」资料回答	引用已同步的笔记与消息摘要
办事	查文件、搜本地、必要时跑 shell	写操作前必须确认
记忆	记住偏好和长期事实	维护 MEMORY，下次带入 prompt
同步	定时拉各平台内容	后台 job，主会话只见简报
扩展	接新工具	配置 MCP Server

4.3 系统分层

┌─────────────────────────────────────────┐
│  Electron（聊天、设置、写操作确认）       │
├─────────────────────────────────────────┤
│  FastAPI（REST + SSE）                   │
├─────────────────────────────────────────┤
│  Agent 运行时（while loop + 工具路由）    │
├─────────────────────────────────────────┤
│  内置工具 │ MCP │ 各平台 sync 适配器       │
├─────────────────────────────────────────┤
│  SQLite（会话）│ 本地 md/json（记忆/配置）  │
└─────────────────────────────────────────┘

层	职责	技术选型
表现层	会话列表、流式输出、工具进度条	Electron + 原生页面
接口层	发消息、拉历史、推 SSE 事件	FastAPI
运行时	组 prompt、调模型、跑 tool、控长度	自研 Python loop，OpenAI 兼容 API
能力层	读写、检索、MCP、平台同步	MCP SDK + 各平台模块
数据层	会话与同步状态	SQLite + 用户目录文件

4.4 功能需求

FR-1 对话 Agent
支持多轮 tool loop（读文件、检索、shell、联网等）。改文件、写库、执行 shell 须先经用户点确认。

FR-2 流式输出
模型 token 与 tool 事件经 SSE 推到前端，避免长时间空白。

FR-3 持久记忆
MEMORY.md 放长期事实，USER.md 放用户画像；单轮结束后异步整理，避免每轮把全文历史塞进 prompt。

FR-4 会话存储
SQLite 保存 thread、message、tool 记录，支持续聊。

FR-5 数据源同步
按平台配置飞书、微信读书、小红书、邮箱、云盘等；同步明细不进主会话，主会话可引用摘要。

FR-6 MCP
可挂多个 MCP Server；启动时注册 schema，与内置工具同一路由。

FR-7 后台任务
定时同步、每日简报、会话记忆摘要；完成后只写 summary 字段供主会话读取。

4.5 运行时设计（与四家对照）

PRD 项	灵犀怎么做	参考了谁
主循环	单进程 while，model 与 tools 交替	Hermes
重活	后台 job，结果入库，主会话读 summary	四家「只回摘要」
超长上下文	压旧轮 + 记忆 md 精选	Codex compaction 思路
写操作	pending 状态，UI 批准后再执行	Codex pause、Claude 权限
工具范围	内置 + MCP，按场景禁高危 tool	Claude 白名单
造灵犀这款应用	在 Cursor 里写 Electron 与后端	Cursor

主 Agent 不递归 spawn 子 Agent。复杂同步、简报、记忆整理走后台 job，和 Hermes 的 delegate 类似，但边界更死，避免子 agent 套子 agent。

4.6 数据与隐私

会话与 tool 记录进 SQLite。记忆文件在用户目录，可手改。同步下来的原始数据留在本地，供检索用。模型 API 由用户自填 endpoint；默认不把业务数据上传到自己的云。

4.7 非功能需求

编号	内容
NFR-1	核心数据默认只存本机
NFR-2	写操作可审计（谁批的、何时执行）
NFR-3	先保单人桌面稳定，再考虑多 agent 并行
NFR-4	MCP 或 sync 模块挂了不拖死主 loop

4.8 实现顺序

FastAPI + 基础 loop + SSE
内置工具与用户确认流
SQLite 与 MEMORY/USER
各平台 sync 与简报 job
MCP 接入
Electron 壳与设置页

4.9 接口与模块（结构备忘）

模块	输入	输出	备注
`chat/send`	用户消息、thread_id	SSE：token / tool_start / tool_end / done	主入口
`chat/approve`	tool_call_id、approve/deny	继续或中止该 tool	对应 FR-1
`memory/compact`	thread_id	更新 MEMORY/USER 文件	异步，FR-3
`sync/run`	platform_id	job_id	FR-5
`sync/briefing`	job_id	短文摘要	进主会话可引用
`mcp/register`	server 配置	tool 列表	FR-6

5. 收尾

四家 harness 的骨架已经趋同。灵犀没有整包照搬谁，而是先把 PRD 写清：本地、个人数据、可确认、可扩展。再按表去借 Cursor、Codex、Hermes、Claude Code 的工程习惯。你若在做桌面 Agent，建议先定分层和 FR，再翻四家文档查漏，比堆名词省事。

Cursor, Claude Code, Codex & Hermes Agent: Four Harnesses Compared

Cursor

IDE Agent Runtime

Codex

OpenAI CLI Agent

Hermes

Nous Research

Claude Code

Anthropic

I compared Cursor, Claude Code, Codex, and Hermes Agent by reading how each one actually runs. Different shells, similar bones: UI and permissions outside, a while loop between model and tools inside, heavy work pushed to sub-runs that return only a summary. Then I map that to Lumina (灵犀) as a PRD.

1. Main loops

Cursor sits in the IDE. The main session alternates model calls and tools; Task and /multitask delegate work. Worktrees help when several edits touch the same repo.

Codex drives turns in Rust via SessionTask: sample, run tools, write back, sample again. Threads, turns, and items are explicit; JSON-RPC to the client; long jobs can pause for approval.

Hermes centers on one Python AIAgent and run_conversation(), a synchronous while loop. delegate_task is handled in the tool layer.

Claude Code shows runAgent() outward; query() is the real loop. Sub-agents enter through the Agent tool; their tool sets exclude Agent to limit nesting.

Context limits hurt everyone. Each product trims or offloads history so the main pane does not fill with tool logs.

2. Sub-tasks

Spawn APIs differ; behavior rhymes. The parent sends a goal; the child runs alone; the parent keeps a summary, not step-by-step tool traces.

Claude Code blocks sub-agents from spawning again. Hermes leaf roles skip delegate by default. Codex often pauses risky steps. Cursor follows IDE confirmation.

3. Reference table

Topic	Cursor	Codex	Hermes	Claude Code
Loop	IDE	Session turn	Python while	`query()`
Delegate	Task	AgentControl	delegate_task	Agent
Parent gets summary only	Yes	Yes	Yes	Yes
Child respawn	Limited	Depth cap	leaf off	Blocked

4. Lumina PRD

Code: GitHub.

Goals

Local personal secretary: connect Feishu, WeRead, Xiaohongshu, mail, cloud files; tool-using chat; durable memory; MCP plugins. Not a cloud IDE.

Layers

Electron UI → FastAPI + SSE → Python agent loop → tools / MCP / sync → SQLite and local md.

Functional requirements

FR-1: tool loop with mandatory approval for writes and shell.
FR-2: SSE streaming.
FR-3: MEMORY.md / USER.md after sessions.
FR-4: SQLite threads.
FR-5: sync jobs with briefings only in chat.
FR-6: MCP registration.
FR-7: background jobs with summary fields.

Runtime vs four harnesses

Item	Lumina	From
Main loop	Single while	Hermes
Heavy work	Background job + summary	Shared pattern
Writes	Pending until UI approves	Codex / Claude
Tools	Built-in + MCP allowlist	Claude
Building the app	Cursor for code	Cursor

No recursive sub-agent spawn; sync and briefings use jobs instead.

NFR

Local storage default; auditable approvals; isolate MCP/sync failures from the main loop.

Build order

API + loop + SSE → tools + approval → SQLite + memory → sync → MCP → Electron.

API sketch

chat/send (SSE), chat/approve, memory/compact, sync/run, sync/briefing, mcp/register.

5. Closing

The four harnesses rhyme on structure. Lumina picks pieces for local data and explicit consent. Fix the PRD layers first, then steal engineering habits from each vendor as needed.