让 AI 一直跑又不跑偏,真的太难了

使用Claude code、codex这类工具久了,有时候就挺想让他们一直运行下去。但又怕他们自己写代码写偏,而且长时间运行可能还会导致模型上下文爆了。针对这个需求,我设计了一套Claude code监督codex的工作流。

今天就把这套思路分享给大家。这不仅仅是个方案,更是一种思路,大家完全可以拿去改成适合自己的版本。

特别提醒:这个思路适合从 1 到 n 的迭代开发。如果是 0 到 1 的新项目,我还是建议大家自己动手,或者亲自盯着模型做。

选对工具,省钱又省心

我自己的情况是:有 ChatGPT Plus,有 codex 使用权限,同时还有 glmcoding plan lite(可以配置到 Claude Code 里用)。Gemini 我也有,但 Gemini cli 的体验我个人觉得一般,所以这里就用 Claude Code + codex 来演示。

总结一下就是:

  • glm 的 coding plan:额度多,我基本没碰到过限额
  • Claude Code:有时会出现 过早完成任务 的情况
  • codex:相对更稳一点,但模型更贵,要省着用

所以我这里的策略是:让 Claude Code 来充当监督者,让 codex 去干活儿。

关于 codex 模型,我建议用 ChatGPT-5.2-medium。带 codex 后缀的模型官方说的是专门针对编程和代理任务优化1但我实际测下来干活效果不太理想。medium 类似“Auto”,你也可以选 high,但是不要选 Xhigh,我之前试过,效果是真好,但一天跑完了一周的额度,钱包真的受不住。


两层防跑偏保险

这套 workflow 里,我最在意的是“防跑偏”和“防作弊”。

所以我用了两个东西做双保险,一个是tasks.md一个是feature_list.json,主要对比如下:

1. 对比表格

特性tasks.mdfeature_list.json
核心定位执行层:具体的实施步骤与验证过程管理层:产品功能需求的最终状态
颗粒度细粒度:一个功能可能拆分为多个任务(1.1, 1.2, 1.3)粗粒度:一个 Ref ID 对应一个完整功能点(R1)
Worker 权限部分写入:仅允许添加 BUNDLE 行(交付代码包路径)完全禁止:禁止修改任何内容(严禁自作主张改需求或状态)
Supervisor 权限管理执行:勾选 Checkbox,写入 EVIDENCE(通过/失败结论)更新状态:仅在验证通过后,将 passes 字段改为 true
内容形态Markdown:包含人类可读的指令、测试标准、运行日志路径JSON:结构化数据,包含 Ref ID、描述、布尔值状态
生命周期动态交互:随着每次运行不断追加日志、报错、重试记录相对静态:只有在功能真正“做完且验过”时才会翻转状态
给人类+AI Agent主要给AI Agent

2. 作用与联系

各自的作用

  • tasks.md(过程)

    它是过程记录。它记录了从代码实现到最终验证的完整流水线。Worker 可以在这里犯错、重试(Attempt #1, #2...),Supervisor 在这里记录具体的验证命令和截图路径。它是人机协作的作业空间,容纳了试错与迭代的细节,确保过程的可追溯性。

  • feature_list.json(结果)

    它是验收基准。它不记录具体的开发曲折,只映射最终的交付状态。负责 哪些端到端能力已经真正验过并通过 ,它用稳定 ref 来做长期清单,默认全部 passes=false,只有当某个 ref 的 PASS 证据链已经存在时才允许更新为通过。

靠什么联系起来?

两者通过 Ref 标签(如 [#R1] 进行刚性绑定:

  1. 映射关系tasks.md 中的具体任务行会携带标签(例如 - [ ] 1.1 实现登录接口 [#R1]),这个标签直接对应 feature_list.json 中的 "ref": "R1" 条目。
  2. 状态流转(单向驱动)

    • 先在 tasks.md 验证:Supervisor 必须先在 tasks.md 中运行 Worker 提供的代码包,确认测试通过,并写入 EVIDENCE ... RESULT: PASS
    • 后在 feature_list.json 归档:只有当 tasks.md 里的证据链确凿无疑(PASS)后,Supervisor 才有权限去修改 feature_list.json 中对应 R1passes 字段为 true

为什么要这么死板?因为只靠一份任务清单,模型是可能“看起来完成了”,但实际没完成;而 feature_list.json 这种能让我们更容易发现它是不是在糊弄。某种意义上,它就是防止“做个样子但不可用”的那道门槛2

另外,为了最大程度减少“需求没对齐就开干”,我还加了一个 skills,让 AI 能反问我们,把需求再确认一遍。


总体思路

[角色分工] Claude Code 充当监督者(Supervisor),Codex 则是工人(Worker)

为什么要这么拆?

因为真正怕的不是它不会写代码,而是:

  1. 它觉得“自己做完了”,但其实只是做了个样子
  2. 它偷懒绕过验证,或者验证不可复现
  3. 它跑偏了还自信满满,最后我们接手的时候一地鸡毛

所以这里使用两个 Agent 进行工作,最大程度的防止作弊,一个只负责写、一个只负责验收。

[启动] 整个流程开始于我使用 Codex (工人)生成的一份 OpenSpec 变更提案,这些提案会被转化为 tasks.md 中具体的待办事项列表。每当需要执行一项新任务时,Claude Code (监督者)就会启动一个subagent,使用codex exec调用 Codex (工人)。然后使用自然语言调用 OpenSpec。OpenSpec 最好是0.19.0版本,因为再新的版本 OpenSpec 的工作流重构了,也支持自然语言调用,但使用的是skills触发3

[执行与交付] Codex (工人)在写完代码后,它必须制作并交付一个可复现的测试方案作为完工凭证并放在auto_test_openspec 目录下:

- CLI 任务: 包内必须包含自动化测试脚本(run.sh)。
- GUI 任务: 包内必须包含一份不含可执行代码的 MCP 操作方式(Markdown 格式),以及仅用于启动服务的脚本。

[验收与确权] Claude Code (监督者)会亲运行脚本进行验收,对于 GUI 任务,它会严格按照剧本调用 playwright-mcp 服务驱动浏览器,并抓取截图作为铁证,确保功能不仅代码写了,而且真实可用2

只有当 Claude Code (监督者)亲自确认测试方案运行通过,且手中的证据链完整无误时,它才会执行一系列 确权 操作:

  1. 在 tasks.md 中勾选任务。
  2. 更新 feature_list.json 的 pass 状态。
  3. 执行 Git 提交存档。
  4. 将包含证据指针的交接日志写入 progress.txt。

[异常处理] 如果中遇到技术卡点, Claude Code (监督者)会利用 Context7 或浏览器搜索工具自主寻找解决方案并指导执行者重试。

目录结构

.
├── auto_test_openspec/                     # [根目录衍生品] 不可变的证据仓库
│   ├── run-0001__task-1.1__ref-R1.../      # 具体某次任务的“验证包” (Run Folder)
│   │   ├── run.sh                          # 自动化复现脚本
│   │   ├── task.md                         # 验证操作手册
│   │   └── ...                             # (日志、截图、输入输出等)
│   └── ...
│
├── git_openspec_history/                   # [根目录衍生品] Git 提交索引
│   └── runs.log                            # 索引日志:回溯 Run ID <-> Git Commit SHA
│
└── openspec/
    └── changes/
        └── <change-id>/                    # [OpenSpec 变更内产物]
            ├── feature_list.json           # 特性清单与通过状态 (双重账本)
            ├── progress.txt                # 交接日志 (记录对话与验证结果)
            └── tasks.md                    # (任务列表源文件)

### 怎么保存记忆?

每个任务单独的一个subagent,这样做是可以保证上下文不会过长和污染。但记忆则确保不了,我的方案是。

1. 核心机制:“启动仪式” (The Startup Ritual)

  1. 要求 Codex(工人)在干活前必须先读取历史档案:

    • 必须读取 openspec/changes/<change-id>/progress.txtfeature_list.json
    • 必须运行 git log --oneline -20 来获取最近的代码变更历史。
    • 必须把读到的这些信息写进 auto_test_openspec/$ARGUMENTS/<run-folder>/logs/worker_startup.txt,证明“我看过以前发生什么了”。

2. 三个记忆文件

  1. tasks.md 作为项目的“任务记忆”与唯一事实来源,它维护着所有任务的执行状态清单。 Claude Code (监督者)通过读取此文件来决定当前的派发逻辑,而 Codex (工人)则依靠它明确具体的实施目标,从而确保双方对 哪些任务已完成、哪些待执行 拥有一致的认知。
  2. progress.txt 这是一个只增不减的“过程记忆”日志,用于在不同会话间传递交接信息。每当任务结束, Claude Code (监督者)会将对话摘要、验证结果及报错信息固化至此;新启动的 Codex (工人)必须通过查阅该文件中的历史记录(特别是失败或阻塞的原因),来汲取前车之鉴,从而避免重蹈覆辙。
  3. feature_list.json 它是项目完成度的状态,专门记录各个功能模块的验证通过状态。在该机制下,Codex (工人)仅拥有读取权限以确认依赖项状态,只有在 Claude Code (监督者)完成严格验证后才会更新此文件,从而保证了关于项目整体可用性的记忆既连续又具备绝对的权威性2


Skills和mcp配置

1. 配置 MCP

如果你的任务涉及 GUI(或者 MIXED),我强烈建议加 playwright-mcp。因为我们想做到的是:Supervisor 不靠手动点页面,也不靠脚本跑 Playwright,而是通过 MCP 驱动浏览器并采集证据(截图、日志等)。

playwright-mcp:

claude mcp add --transport stdio --scope user playwright-mcp -- npx -y @playwright/mcp@latest

再配一个 context7(遇到卡点能查资料、补上下文):

claude mcp add context7 -- npx -y @upstash/context7-mcp@latest

我这里浏览器搜索 MCP 用的是智普的(你也可以换别家的,只要名字对得上就行):

claude mcp add -s user -t http web-search-prime https://open.bigmodel.cn/api/mcp/web_search_prime/mcp --header "Authorization: Bearer your_api_key"
claude mcp add -s user -t http web-reader https://open.bigmodel.cn/api/mcp/web_reader/mcp --header "Authorization: Bearer your_api_key"

配置示例

2. skills

这几个 skill 我是直接放在仓库里维护的,大家可以按需下载:

codex 用的:

Claude Code 用的:

自定义 openspec-unblock-research 的 mcp server


1. 配置mcp server

在 Claude Code 中运行 mcp list。必须看到 mcp__<new-search-name>__*mcp__github__* (或其他辅助工具) 均已加载。

2. 修改核心文件 (SKILL.md)

openspec-unblock-researchSKILL.md 进行两处关键修改:

1. 修改文件头部 Description

保持描述与实际工具一致。

- 把 `mcp__web-search-prime__*`
- 改为 `mcp__<new-search-name>__*`

2. 修改 Default Provider Ordering

在文件底部的列表里 **插入新工具** 并 **替换旧搜索**。

修改示例:

## Default provider ordering (if caller omits toolchain_config)

1. `mcp__context7__*` (authority source)
   ...

2. `mcp__github__*` (新增: internal authority)
   - Use for: checking existing issues/bugs in repo or upstream.
   - Trigger when: `error_excerpt` looks like a library bug.
   - Stop when: found a closed issue matching symptoms.

3. `mcp__<new-search-name>__*` (替换原有的 search-prime)
   - Use for: recent regressions, common pitfalls.
   - Trigger when: `error_excerpt` includes searchable strings.
   - Stop when: have candidate links to verify.

4. `mcp__web-reader__*` (evidence fetcher)
   ...


需要更改的文件

可选:规范代码

修改AGENT.md。这个主要目的是为尽量写的代码规范一点精简一点,属于个人喜好,当然你也可以配置一下其他的,比如必须使用uv虚拟环境等等。大家如果觉得没必要的话可以不加

## Code hygiene guardrails (always-on)

- Prioritize correctness and maintainability over cosmetic changes.
    
- Keep scope tight: don’t refactor unrelated areas; avoid “while I’m here” edits.
    
- Write for the next reader: choose clear names, straightforward control flow, and readable structure.
    
- Avoid clever compactness (dense one-liners, nested ternaries). Prefer if/else or switch when branching grows.

关键文件修改

为了让这套流程跑起来,我们需要覆盖或新建几个配置文件。

openspec-proposal.md需要添加的

位置:

  • Windows: %USERPROFILE%\.codex\prompts\openspec-proposal.md
  • macOS/Linux: ~/.codex/prompts/openspec-proposal.md

目的:让openspec生成的task.md比较符合我们的需求。

注:该文件必须在输入openspec init后修改,否则会默认重置掉。

Steps6后面添加

- When drafting `openspec/changes/<id>/tasks.md`, you MUST follow:
  - `openspec/project.md` → `## tasks.md Checklist Format` (canonical; do not invent a parallel format).

- Hard gate reminders (do not expand here; see canonical spec above):
  - Every task MUST include `ACCEPT:` and `TEST:`.
  - Every checkbox task line MUST include EXACTLY ONE `[#R<n>]` token, unique across the file.
  - `TEST:` MUST include `SCOPE: CLI|GUI|MIXED` and MUST enable a human-reproducible validation bundle
    (all bundle rules + role split + evidence rules live ONLY in `openspec/project.md`).

  - Role split (mandatory; see `openspec/project.md` → “Validation bundle requirements”):
    - Worker produces bundle assets only; Supervisor executes and records PASS/FAIL evidence.

  - GUI/MIXED constraint (mandatory; see `openspec/project.md` → “CLI/GUI/MIXED validation requirements”):
    - GUI verification must be driven via MCP service `playwright-mcp` and evidence must be archived; do NOT use any browser automation scripts (Python/Node/Playwright test runner).

项目目录:openspec\project.md

目的:让openspec生成的task.md比较符合我们的需求。

project.md末尾添加

## tasks.md Checklist Format

This section is the SINGLE canonical spec for tasks.md format and validation bundles.
Do not duplicate this spec elsewhere; other docs must link here.

### Task Line Format (required)

Each checkbox task line MUST follow:
- `- [ ] <task-id> <task summary> [#R<n>]`
- `<task-id>` MUST be dot-numbered (e.g. `1.1`, `2.3`).
- Each checkbox line MUST include EXACTLY ONE `[#R<n>]` token (e.g. `[#R1]`).
  - `[#R<n>]` MUST be unique across the entire tasks.md (never reuse).
- Every task MUST include both `ACCEPT:` and `TEST:` blocks.
- `TEST:` MUST include `SCOPE: CLI|GUI|MIXED` and MUST be implementable into a validation bundle
  per `### Validation bundle requirements (mandatory)` below.

### Example (copy/paste)

- [ ] 1.1 Do X and produce Y [#R1]
  - ACCEPT: ...
  - TEST: SCOPE: CLI
    - When done, generate validation bundle under:
      auto_test_openspec/<change-id>/<run-folder>/
    - run-folder MUST be:
      run-<RUN4>__task-<task-id>__ref-<ref-id>__<YYYYMMDDThhmmssZ>/
    - Run: auto_test_openspec/<change-id>/<run-folder>/run.sh (macOS/Linux) or run.bat (Windows)
    - run-folder MUST be:
      run-<RUN4>__task-<task-id>__ref-<ref-id>__<YYYYMMDDThhmmssZ>/
    - Run: auto_test_openspec/<change-id>/<run-folder>/run.sh (macOS/Linux) or run.bat (Windows)
    - Inputs: inputs/sample.json
      Outputs: outputs/result.json
    - Verify: compare against expected/result.json (or rule-based assertions)

### Validation bundle requirements (mandatory)

For every task, `TEST:` MUST be written so:
- the Worker can produce a **human one-click reproducible** validation bundle (assets + scripts for CLI checks; GUI checks are MCP-driven and MUST NOT use any browser automation scripts),
- AND the Supervisor can execute it and record the final PASS/FAIL evidence chain
  (each run-folder is immutable; evidence pointers are written after execution).

0) Roles & responsibilities (mandatory)
- Worker (produces artifacts; not the final verifier):
  - Implement product code + write tests (CLI). For GUI/MIXED, produce an MCP runbook only (no executable browser automation scripts).
  - Produce the validation bundle assets under the run-folder:
    `task.md`, `run.sh`, `run.bat`, `tests/` (CLI tests and/or GUI MCP runbook; no executable browser scripts), and (when applicable) `inputs/`, `expected/`.
  - MUST NOT declare PASS/FAIL.
  - MUST NOT overwrite/edit prior run-folders (append-only history).

- Supervisor (executes validation; forms the evidence chain):
  - MUST create a brand-new run-folder for every validation attempt (never overwrite).
  - Executes `run.sh` / `run.bat`, captures `outputs/` + `logs/` + GUI evidence when applicable.
  - MUST write the final PASS/FAIL result + evidence pointers (this is the DONE hard gate).

1) Canonical on-disk location (repo root; append-only)
- Root folder (fixed):
  - `auto_test_openspec/<change-id>/`
- Each validation attempt MUST create a brand-new run folder (never overwrite; keep ALL history forever):
  - `auto_test_openspec/<change-id>/<run-folder>/`
- Once created, a run folder MUST be treated as immutable evidence:
  - do not edit prior runs; create a new run folder instead.

2) Run folder naming (required; MUST include run#, task-id, ref-id; timestamp recommended)
- `<run-folder>` MUST follow this exact pattern:
  - `run-<RUN4>__task-<task-id>__ref-<ref-id>__<YYYYMMDDThhmmssZ>/`
- Example:
  - `run-0007__task-1.1__ref-R1__20260111T031500Z/`
- Rules:
  - `<RUN4>`: zero-padded, monotonic run counter (e.g. 0001, 0002, ...).
    - MUST match the Supervisor workflow RUN_COUNTER / `EVIDENCE (RUN #n)` numbering for audit alignment.
    - Mapping rule: `RUN #7` => `run-0007`, `RUN #12` => `run-0012`.
  - `<task-id>`: dot-numbered task id from the checkbox line (e.g. `1.1`).
  - `<ref-id>`: stable ref id derived from the task tag (e.g. `[#R1]` → `R1`).
  - `<YYYYMMDDThhmmssZ>`: UTC timestamp to guarantee uniqueness and ease auditing.

3) Minimum required contents inside EVERY run folder
Each run folder MUST contain at least:

A) `task.md` (this run’s readme; MUST be self-sufficient)
task.md MUST include:
- change-id, run#, task-id, ref-id
- SCOPE covered (CLI / GUI / MIXED)
- How to run (Windows + macOS/Linux)
  - CLI: run.sh/run.bat executes CLI checks.
  - GUI/MIXED: run.sh/run.bat starts the service only; GUI steps are executed via the MCP runbook under tests/.
- Test inputs (if any): input file paths, params, sample data
- Test outputs (if any): what files/stdout/stderr/screenshots/logs will be produced and where
- Expected results (machine-decidable): pass/fail criteria
  - exit code checks
  - stdout/stderr assertions (required when relevant)
  - file existence/content assertions (required when outputs exist)
  - GUI assertion points (when GUI/MIXED): which screenshots/states prove correctness
- Hard rules (GUI/MIXED):
  - task.md MUST NOT contain manual browser steps (no “open Chrome/click buttons” prose).
  - task.md MUST point to the MCP-only runbook under tests/ (e.g., tests/gui_runbook_<topic>.md).
  - Any required “copy/seed/prepare input/state” steps MUST be written as exact commands/steps here (and referenced by the runbook). run.sh/run.bat MUST NOT perform them.
- Provenance of expected/assumptions:
  - If inputs/expected are not provided by a human, the Worker MUST generate them and document where they came from
    (e.g., derived from ACCEPT, or an explicit reasonable assumption).


B) One-click scripts (both required; GUI/MIXED = start-server only)
- run.sh (macOS/Linux)
- run.bat (Windows)

Script requirements (all bundles):
- Must assume the default dev machine environment is ready.
- Non-destructive:
  - MUST NOT modify global environment
  - MUST NOT globally install dependencies
  - MUST NOT write to system directories
- Must be runnable from ANY working directory:
  - the script MUST cd/pushd to its own directory first, then resolve paths from there.

Hard rule (when SCOPE includes GUI):
- run.sh/run.bat MUST be start-server only:
  - MUST: start the local service and print the access URL/port (e.g., http://127.0.0.1:<PORT>/)
  - MUST NOT: copy/overwrite data files, mutate state/inputs, generate exports/outputs, run tests, run exports, probe/install dependencies, or perform environment probes (python/uv version checks do NOT belong in GUI start scripts)
  - Any required “copy/seed/prepare input/state” steps MUST be documented as exact commands/steps in task.md (and referenced by tests/gui_runbook_*.md) for the Supervisor to execute and record in EVIDENCE.

For CLI bundles (or the CLI portion of MIXED):
- run.sh/run.bat SHOULD print key results to console and SHOULD write logs to logs/.
- Environment provenance SHOULD be documented as optional preflight commands in task.md (not forced into GUI start scripts), e.g.:
  - interpreter path + version (Python/Node if used)
  - uv --version when Python/uv is involved
- When provenance is executed, it SHOULD be recorded to logs/.

C) Test asset folders (create the ones that apply)

- `logs/` MUST exist (always):
  - run logs, env/version info, command transcript, GUI screenshot index, etc.
- `tests/` MUST exist when:
  - SCOPE includes GUI (MCP-driven via `playwright-mcp`), OR
  - validation is not fully expressible as simple CLI assertions.
- `inputs/` MUST exist when the task involves file input (see I/O hard rule below).
- `outputs/` MUST exist when the validation produces file outputs (see I/O hard rule below).
- `expected/` SHOULD exist when golden-file comparison is used; otherwise rule-based assertions are acceptable.

4) Hard rule: “input file + output file + output validation”
If the task validation is “given an input produces an output” in ANY form:

- `inputs/` MUST contain at least one reproducible input sample.
- `run.*` MUST write the real produced outputs into `outputs/` (never into random temp/system dirs).
- The bundle MUST include at least one machine-decidable verification method (pass/fail), typically:
  - (A) golden file compare against `expected/` (exact match OR documented allowed-diff rules), and/or
  - (B) rule-based assertions (e.g. JSON schema, key fields, row counts, regex match, exit code, forbidden strings).

`task.md` MUST explicitly describe:
- what the input is
- what output is produced
- what “expected” means
- and exactly how the script validates it

5) CLI / GUI / MIXED validation requirements
- If SCOPE includes CLI:
  - MUST run the real CLI command(s) in `run.*`
  - MUST check exit code
  - MUST assert key stdout/stderr content (or absence of known-bad patterns)
  - If files are produced: MUST use `outputs/` + `expected/` and/or rule assertions as above

- If SCOPE includes GUI:
  - The validation bundle MUST provide an MCP-only GUI verification runbook
    (stored under tests/ and executed by the Supervisor via playwright-mcp; do NOT use any scripts to drive the browser).
  - Hard rule: run.sh/run.bat MUST be start-server only for GUI/MIXED bundles:
    - MUST: only start the service and print URL/port
    - MUST NOT: copy/seed/prepare input/state, generate exports/outputs, run tests, or perform environment probes
    - Any required data prep steps MUST be written as exact commands/steps in task.md (and referenced by the runbook).
  - Supervisor execution constraint (mandatory):
    - GUI verification MUST be driven via MCP service playwright-mcp
      - no manual browser interaction
      - no Python/Node/Playwright scripts to drive the browser
  - Must archive auditable evidence artifacts (append-only; never overwrite):
    - at minimum: screenshots (e.g., outputs/screenshots/ plus a screenshots index file in logs/)
    - recommended: trace/video and a console log index when available from MCP (paths recorded in logs/)

- If SCOPE is MIXED:
  - The bundle MUST cover both CLI and GUI checks (either in one test file or split; see “two test files” rule below).

6) Allowing two test files (when needed; organization rule)
Default: one test file should cover key acceptance points.

Two test files are allowed / recommended when:
- CLI + GUI are both involved:
  - one test focuses on CLI
  - one runbook focuses on GUI (MCP steps + assertions; no executable browser scripts)
- Same entrypoint but two distinct paths must be covered:
  - happy path + error/edge path (e.g., valid vs invalid args)
- GUI needs both “functional flow” and “render/state”:
  - split into two smaller, more stable tests

Suggested naming under the run folder:
- `tests/test_cli_<topic>.*`
- `tests/gui_runbook_<topic>.md` (MCP-only steps + assertion points; no executable browser scripts)

Note:
- “two test files” refers to validation assets under `tests/` (CLI test scripts and/or GUI MCP runbook).
- The “input/output two files + validation” rule refers to runtime data under `inputs/outputs/expected` and is additive, not conflicting.

7) Environment isolation (uv venv rule; mandatory when env problems occur)
- Under no circumstances may the Worker “pollute global Python env” to make validation pass (e.g., global `pip install`).
- If the Worker encounters environment problems (missing deps, conflicts, cannot run):
  - MUST create an isolated venv using `uv`
  - Recommended location: inside THIS run folder (e.g. `<run-folder>/.venv/` or `<run-folder>/venv/`)
  - All installs/runs must occur inside that venv
- `run.*` and/or `logs/` MUST clearly record:
  - which interpreter is used
  - uv version
  - where dependencies came from (lockfile / pyproject / etc.)
- Note:
  - Creating a venv is conditional (only when env problems occur),
    but running the full validation bundle is unconditional (always required).
    

8) tasks.md bookkeeping lines (mandatory; role split; no duplicated rules elsewhere)
- Under the task entry in `openspec/changes/<change-id>/tasks.md`, TWO lines are mandatory:
  - Worker-written (bundle-ready; NO PASS/FAIL):
    - `BUNDLE (RUN #n): ... | VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder> | HOW_TO_RUN: run.sh/run.bat`
  - Supervisor-written (final decision + evidence pointers):
    - `EVIDENCE (RUN #n): ... | VALIDATED: <exact commands + exit code> | RESULT: PASS|FAIL | GUI_EVIDENCE: <paths when applicable>`
- Worker MUST NOT claim PASS/FAIL anywhere; Supervisor is the only role that records PASS/FAIL after running the bundle.

项目目录:.\claude.md

目的:明确Claude code的任务身份、工作流。

完全覆盖claude.md

 # CLAUDE.md (OpenSpec + Codex Supervisor)
 
 You are the SUPERVISOR (Claude Code). Your job is to coordinate Codex to implement OpenSpec change tasks safely, one task at a time, and to keep the repo’s execution trace accurate.
 
 IMPORTANT: All output and all “model-to-model” / tool-assisted dialogue must be in English. Do not produce Chinese text.
 
 ## Source of truth
 - `openspec/changes/<change-id>/tasks.md` is the single source of truth for implementation progress.
 - Do not use `TODO.md` for this workflow. Do not invent tasks outside `tasks.md`.

## Additional long-running artifacts (durable across sessions)
- openspec/changes/<change-id>/feature_list.json is the durable end-to-end feature checklist.
  - One entry per stable ref tag (e.g., [#R1] in tasks.md maps to "ref": "R1" in JSON).
  - Default all features to failing (passes=false) until validated.
  - Governance (strict):
    - Supervisor/initializer OWNS the list content (feature definitions/steps).
    - Worker is FORBIDDEN to add/remove/rewrite feature entries.
    - Worker is FORBIDDEN to update pass-state fields (passes or any pass-state metadata).
    - Supervisor updates pass-state ONLY after a PASS evidence chain exists for that ref (post-validation).
    - If the file or matching ref entry is missing: treat as BLOCKED and record in tasks.md; do NOT scaffold or invent entries.
- openspec/changes/<change-id>/progress.txt is the Supervisor-written handoff log.
  - Append-only. One RUN entry per task attempt (one subagent / one Codex run).
    - A single /monitor-openspec-codex ... invocation MUST append at most ONE RUN entry (no batch loop by default).
    - To retry or continue to the next task, start a new invocation so long-running/background processes do not accumulate.
  - Each RUN entry MUST include:
    - git anchors (commit SHA + commit message; and either diffstat or touched file list),
    - validation commands + results,
    - detailed Supervisor↔Worker dialogue + tool/command trace in `[Assistant] ...` / `[Tool Use] ...` style for replay/audit.
  - Must reflect only verified facts (no aspirational claims).
- `git_openspec_history/<change-id>/runs.log` is a durable per-change index of git checkpoint commits:
    - Store under repo root: `git_openspec_history/<change-id>/` (folder name MUST equal `<change-id>`).
    - Append-only log: `git_openspec_history/<change-id>/runs.log` (one line per successful RUN linking run# → commit → diffstat/files).
- `git history` is treated as a third durable artifact:
    - Every successful RUN ends with ONE rollback checkpoint commit (descriptive message), and the same commit MUST be recorded in `git_openspec_history/<change-id>/runs.log`.

## Entry points (user-facing)
- The user starts supervision with: `/monitor-openspec-codex <change-id>`
- Session unit rule (mandatory):
  - One invocation/session advances EXACTLY ONE unchecked tasks.md checkbox item.
  - State restoration across sessions relies on: progress.txt + feature_list.json + git history
    + git_openspec_history/<change-id>/runs.log.

## Worker invocation (Codex CLI)
# Single Codex command constant (maintain ONLY ONE copy)
CODEX_CMD = codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium

How it works:
- Supervisor composes a single English prompt that targets ONE tasks.md checkbox item.
- Worker runs: `CODEX_CMD "<INLINE_PROMPT>"` and must implement ONLY that one task.
- Worker MUST do the Startup ritual inside the Codex run (before touching code):
  - read: openspec/changes/<change-id>/progress.txt + feature_list.json (+ tasks.md as needed)
  - inspect: `git log --oneline -20`
  - capture `GIT_BASE` via `git rev-parse --short HEAD`
  - write a Startup snapshot into the validation bundle (NOT tasks.md), at:
    - `auto_test_openspec/<change-id>/<run-folder>/logs/worker_startup.txt`
    - MUST include (at minimum): UTC timestamp, CODEX_CMD, GIT_BASE, the `git log --oneline -20` excerpt, and a short “what I observed” summary.
  - NOTE: Do NOT write STARTUP/GIT_BASE fields into tasks.md. Supervisor may cite this file path later in EVIDENCE.
- Worker MUST NOT toggle any tasks.md checkbox. Supervisor owns checkboxes.
- Worker MUST NOT edit feature_list.json (neither entries nor pass-state).
- Worker MUST NOT create git commits.
- Worker MUST NOT write any EVIDENCE (RUN #n) line, and MUST NOT write validated=/PASS/FAIL/RESULT conclusions.
- Worker output is limited to:
  - implementation + bundle assets
  - and ONE tasks.md bookkeeping line:
    - BUNDLE (RUN #n): ... | VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder> | HOW_TO_RUN: run.sh/run.bat | (if GUI) RUNBOOK: tests/gui_runbook_<topic>.md
- Supervisor (post-validation, PASS only) is responsible for:
  - writing EVIDENCE (RUN #n) with MCP/screenshots (when GUI/MIXED),
  - creating ONE checkpoint commit,
  - updating feature_list pass-state,
  - and appending runs.log (if applicable).

CRITICAL (mandatory):
- The subagent is FORBIDDEN from implementing tasks directly (no manual coding/editing/writing files).
- The subagent MUST make exactly ONE Bash tool invocation to perform work, and that single invocation MUST run CODEX_CMD (no other shell commands).
- Product-code and bundle-asset changes MUST be produced by codex exec (via CODEX_CMD).
- Supervisor is explicitly allowed (and required) to edit bookkeeping artifacts:
  - toggle tasks.md checkboxes, write EVIDENCE (RUN #n) lines, append progress.txt, and create ONE checkpoint commit on PASS.
- Background-process rule (to prevent process/token accumulation):
  - Do NOT start multiple background/monitor commands in a single invocation.
  - If any long-running process was started (e.g., a server), terminate it before starting a new attempt.

Important note about `/prompts:*`:
- `/prompts:<name>` is a Codex CLI slash-command feature designed for the INTERACTIVE Codex UI session.
- Do NOT rely on `/prompts:*` in automated non-interactive runs (`codex exec`). Instead, inline the workflow instructions directly into `<INLINE_PROMPT>`.
 
## Roles
- Supervisor (you): dispatches ONE task attempt per invocation (one subagent / one Codex run), verifies bundle/evidence + validation, decides accept/reject/block, and records the handoff.
  - Within a single /monitor-openspec-codex ... invocation, the Supervisor MUST NOT dispatch multiple attempts (no batch loop).
  - To retry the same task (Attempt #k+1) or continue to the next task, start a new invocation so background processes do not accumulate.
  - Supervisor is the ONLY role allowed to toggle checkboxes in `tasks.md`.
  - Supervisor is the ONLY role allowed to edit `openspec/changes/<change-id>/progress.txt` (append-only).
  - Supervisor records, per RUN, the git anchors (commit SHA/message + diffstat/files) and the detailed dialogue/tool trace for audit/replay.

- Worker (Codex via CODEX_CMD): coding agent for ONE task only.
  - MUST perform Startup ritual at the beginning of EVERY run (progress.txt + feature_list.json + `git log --oneline -20` + `git rev-parse --short HEAD`)
    and write what was observed into the validation bundle log:
    - `auto_test_openspec/<change-id>/<run-folder>/logs/worker_startup.txt` (mandatory)
  - MUST implement + write tests (CLI) + produce the validation bundle assets (task.md/run.sh/run.bat/tests/inputs/expected as needed);
    for GUI/MIXED, `tests/` MUST contain an MCP runbook only (no executable browser automation scripts).
  - MUST NOT execute final validation, MUST NOT declare PASS/FAIL, MUST NOT write a “validated” conclusion.

- Supervisor: executes validation and forms the final evidence chain.
  - Runs `auto_test_openspec/<change-id>/<run-folder>/run.sh|run.bat`
  - For GUI/MIXED, drives the browser via MCP service `playwright-mcp` (do NOT use any scripts to drive the browser)
  - Records PASS/FAIL + evidence pointers, then (only on PASS) performs commit + feature_list pass-state updates.

  - MUST NOT toggle any checkbox in `tasks.md`.
  - MUST NOT edit `openspec/changes/<change-id>/progress.txt`.
  - MUST NOT add/remove/rewrite feature_list entries (only pass-state fields; no content edits).

- Research helpers: skill `openspec-unblock-research` (Supervisor-only)
  - Note (research-only): the skill may use MCP tools internally, and the Supervisor should not call MCP tools directly for research in this workflow.

- Exception (GUI verification is mandatory via MCP):
  - When SCOPE=GUI or MIXED, the Supervisor MUST use MCP service `playwright-mcp` to execute GUI verification and collect evidence (no Python/Node/Playwright scripts).

 ## Task selection rules (tasks.md)
 - Pick the FIRST ELIGIBLE unchecked checkbox item (`- [ ] ...`) in `openspec/changes/<change-id>/tasks.md` (top-to-bottom).
   - ELIGIBLE means:
     - not explicitly marked NOT_EXECUTABLE / SKIP (Supervisor note under the task),
     - not already MAXED,
     - not blocked by an earlier unmet prerequisite under the default weak-ordered dependency rule,
       unless the candidate task has explicit independence evidence (e.g., `INDEPENDENT:` / `NO_DEP:`)
       or an explicit `DEPENDS:` list that does NOT include the unmet prerequisite.
 - Tasks SHOULD include a stable reference tag like `[#R1]` (but do not skip a task if missing).
 - One task = one subagent = one worker run. Never do multiple tasks in a single run.
 
 ## Verification + bookkeeping rules
 After the worker finishes a task:
 1) Re-open `openspec/changes/<change-id>/tasks.md`.
 2) Supervisor is the ONLY role allowed to change any checkbox (`- [ ]` → `- [x]`).
   - Worker/Codex MUST NOT toggle checkboxes.
 3) Under the task, ensure TWO lines exist (role split, mandatory):
  - Worker-written (bundle-ready, no PASS/FAIL):
    - `BUNDLE (RUN #n): ... | VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder> | HOW_TO_RUN: run.sh/run.bat`
  - Supervisor-written (final decision + evidence pointers):
    - `EVIDENCE (RUN #n): ... | VALIDATED: <exact commands + exit code> | RESULT: PASS|FAIL | GUI_EVIDENCE: <screenshots/trace/video/console index paths>`
    - Prefer this format (SINGLE LINE, THIS TASK ONLY):
    EVIDENCE (RUN #n): CODEX_CMD=codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium
    | SCOPE: <CLI|GUI|MIXED>
    | VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder>
    | WORKER_STARTUP_LOG: auto_test_openspec/<change-id>/<run-folder>/logs/worker_startup.txt
    | VALIDATED_CLI: <exact command(s)> | EXIT_CODE: <n>              (omit if no CLI)
    | VALIDATED_GUI: MCP(playwright-mcp) | RUNBOOK: tests/<.> | SCREENSHOTS: <path-or-index>   (omit if no GUI)
    | RESULT: PASS|FAIL
    | (PASS only) GIT_COMMIT: <short_sha_after>
    | (PASS only) COMMIT_MSG: "<message>"
    | (PASS only) DIFFSTAT: "<one-line --stat summary>" OR FILES: <comma-separated touched paths>
    3.1) HARD GATE (mandatory):
    - A task MUST NOT be marked DONE unless the EVIDENCE line (Supervisor-written) contains ALL of:
      - `EVIDENCE (RUN #n): .`   # 明确是哪一次 run
      - `SCOPE: CLI|GUI|MIXED`
      - `VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder>/`
      - `WORKER_STARTUP_LOG: auto_test_openspec/<change-id>/<run-folder>/logs/worker_startup.txt`
      - (If SCOPE includes CLI) `VALIDATED_CLI: <exact commands> | EXIT_CODE: 0`
      - (If SCOPE=GUI or MIXED) `VALIDATED_GUI: MCP(playwright-mcp)` AND `RUNBOOK:` AND at least `SCREENSHOTS: <path or index>`
        (recommended: `TRACE:` / `VIDEO:` / `CONSOLE_INDEX:`)
      - `RESULT: PASS`
      - `GIT_COMMIT: <sha>` and `COMMIT_MSG: "<message>"`
      - and at least one of: `DIFFSTAT:` or `FILES:`
    - Worker may provide `BUNDLE (RUN #n): .` but it is NOT sufficient for DONE.
 4) Decision (Supervisor):
    - If acceptance is satisfied AND RESULT is PASS AND validation evidence exists (per HARD GATE), treat as DONE:
      - Set checkbox to `- [x]` (Supervisor only)
      - Append the RUN entry to `progress.txt` (Supervisor only; verified facts only)
      - (If SCOPE=GUI or MIXED) confirm `MCP: playwright-mcp` + screenshots/trace pointers are recorded and archived
      - Return control to the OUTER batch loop (next eligible task)
    
    - If RESULT is FAIL (or acceptance not satisfied):
      - DO NOT mark the checkbox.
      - Supervisor MUST write:
        - `REVIEW (RUN #n, Attempt #k): <error summary> | EVIDENCE_PATH: <run-folder paths> | CMD: <run.* + exit code>`
      - Supervisor MUST start the next attempt with a BRAND-NEW run-folder (never overwrite), then dispatch Worker to fix based on the REVIEW + evidence.
      - Do NOT “one-off stop” or “only retry once” here.
        Instead, defer to the per-task retry policy:
        - If Attempt < MAX_ATTEMPTS: retry the SAME task with a fresh subagent.
        - If Attempt == MAX_ATTEMPTS: mark the task MAXED and apply dependency-blocking stop logic (stop only if it blocks safe forward progress).
 5) If blocked, ensure there is a `BLOCKED:` note under that task with:
    - a 1–5 line error excerpt,
    - likely cause (if known),
    - the next concrete action to unblock.
6) Git is allowed ONLY for local checkpoint commits (rollback + audit), and it is Supervisor-only.
Allowed (Supervisor-only): git status, git diff, git log --oneline -20, git add -A, git commit -m "<message>", git rev-parse --short HEAD, git show --stat --oneline -1.
Forbidden: git push/fetch/pull/clone, branch/checkout/switch/merge/rebase/reset/cherry-pick/revert, stash, tag, submodule, clean, config.
Create at most ONE commit per RUN, ONLY after Supervisor validation PASS (never based on Worker self-claims), and ensure the working tree is clean after commit.

## progress.txt format (Supervisor, append-only)

File: openspec/changes/<change-id>/progress.txt
Rule: Append-only. Never rewrite or reorder existing entries.

Each RUN entry MUST contain:
A) A structured RUN SUMMARY (fast scanning)
B) A detailed DIALOGUE + TOOL TRACE (replay / audit)

================================================================================
RUN ENTRY

[RUN SUMMARY]
Timestamp (UTC): <ISO-8601 Z>     Run: #<n>     Attempt: <k>
Change: <change-id>               Task: <task-num>      Ref: <ref-tag>

Status: DONE | FAIL | BLOCKED | ROLE_VIOLATION | NO_PROGRESS

Git anchors (this RUN):
- (PASS-only) Commit: <short_sha> "<commit message>"
- (PASS-only) Diffstat (short): <1 line>   OR   Files: <comma-separated touched paths>
- (If not PASS) Commit anchors may be absent; do NOT invent them.

Evidence pointers:
- tasks.md: EVIDENCE (RUN #<n>) under task <task-num>
  - MUST include: CODEX_CMD + SCOPE + VALIDATION_BUNDLE + WORKER_STARTUP_LOG + validation steps (CLI and/or GUI) + RESULT
  - (PASS-only) MUST include: GIT_COMMIT/COMMIT_MSG + DIFFSTAT or FILES
- auto_test_openspec/<change-id>/<run-folder>/: the human-reproducible validation bundle for this RUN (task.md + run scripts + assets + outputs/logs, including logs/worker_startup.txt)
- feature_list.json (PASS-only): entry where ref=="<Rk>" : passes false→true (Supervisor-only)
- git_openspec_history/<change-id>/runs.log (PASS-only): must record the same checkpoint commit for this RUN (commit SHA/message + diffstat/files)
- git history (PASS-only): the commit above is the rollback checkpoint for this RUN

--------------------------------------------------------------------------------
Optional (recommended) SESSION STARTUP ENTRY (once per session)

[SESSION STARTUP]
[Assistant] I'll start by getting my bearings and understanding the current state of the project.
[Tool Use] <read - openspec/changes/<id>/progress.txt>
[Tool Use] <read - openspec/changes/<id>/feature_list.json>
[Tool Use] <read - openspec/changes/<id>/tasks.md>
[Assistant] Let me check the git log to see recent work.
[Tool Use] <bash - CODEX_CMD "...">  (Codex run contains `git log --oneline -20` as part of STARTUP)
[Subagent] <paste the git log excerpt that Codex recorded under THIS task or in the EVIDENCE/STARTUP note>
[Assistant] <what looks healthy / what is next>
================================================================================

## Blocker handling (with research skill)
If a task is blocked:
- When BLOCKED (or repeated NO_PROGRESS), do not call MCP tools directly; always use `openspec-unblock-research` to perform research and produce unblock guidance.
  - The skill may use MCP tools (e.g. `web-search-prime`, `context7`, etc.) internally as configured, but the workflow should treat this as an implementation detail.
- Under the SAME task in `tasks.md`, add/refresh:
  `UNBLOCK GUIDANCE (RUN #n, Attempt #k): ...`
  including: query terms + key conclusions + evidence pointers + executable next steps.
- Retry policy is governed by MAX_ATTEMPTS:
  - Re-run the SAME task with a fresh subagent while Attempt < MAX_ATTEMPTS.
  - If the task reaches MAX_ATTEMPTS without success, mark it MAXED (Supervisor note under the task) and record the distilled blocker in progress.txt.
  - Then apply dependency-blocking stop logic:
    - Stop the whole batch ONLY if this unfinished MAXED task blocks any safe forward progress (default weak dependency unless explicit independence is documented under later tasks).
    - Otherwise, later tasks explicitly marked independent may proceed.

 ## Visual RUN banners (required)
 For each task attempt, print exactly two lines:
 - `[MONITOR] RUN #<n> START | change=<change-id> | task=<task-num> | ref=<ref-tag> | text="<task line>"`
 - `[MONITOR] RUN #<n> END   | status=<DONE|FAIL|BLOCKED|ROLE_VIOLATION|NO_PROGRESS> | validated="<validation steps executed by Supervisor>" | next="<next task or unblock action>"`

.claude/commands/monitor-openspec-codex.md (自动化核心)

  • Windows: %USERPROFILE%\.claude\commands
  • macOS/Linux: ~/.claude/commands
    下新建:monitor-openspec-codex.md

这是我们的“监工脚本”,它定义了 Claude Code 如何自动循环调用 Codex。

新建monitor-openspec-codex.md

---
description: Supervise an OpenSpec change in BATCH MODE. Iterates through unchecked tasks.md items sequentially via Codex CLI (codex exec). Features: per-task isolation (one subagent per task), automatic retries (MAX_ATTEMPTS), dependency blocking (stops on hard failure), skill-based unblocking, and continuous progress.txt logging.
argument-hint: <change-id>
allowed-tools:
  - Read
  - Write
  - Task
  - Bash(codex exec:*)
  - Bash(auto_test_openspec/**/run.sh)
  - Bash(auto_test_openspec/**/run.bat)

  # Minimal FS (Supervisor-only; to create bookkeeping dirs/files deterministically)
  - Bash(mkdir:*)

  # Minimal Git (Supervisor-only, bookkeeping after PASS; avoids “background monitoring” workarounds)
  - Bash(git rev-parse:*)
  - Bash(git status:*)
  - Bash(git log:*)
  - Bash(git add:*)
  - Bash(git commit:*)
  - Bash(git show:*)
  - Bash(git diff:*)
---

You are the SUPERVISOR. Follow this procedure in English only.

# Tool constraints (Supervisor)
- `Write` is allowed ONLY for bookkeeping in:
  - `openspec/changes/<change-id>/tasks.md` (checkbox + REVIEW/EVIDENCE/BLOCKED/UNBLOCK notes)
  - `openspec/changes/<change-id>/progress.txt` (append-only handoff log)
  - `openspec/changes/<change-id>/feature_list.json` (Supervisor-only; PASS-only; may update ONLY the matching ref’s pass-state boolean; no structure/definition edits)
  - `git_openspec_history/<change-id>/runs.log` (Supervisor-only; append-only git-run index for this change; create the folder if missing)
- DO NOT use `Write` to implement product code. All implementation MUST come from the Worker’s single `CODEX_CMD` run.

# Additional long-running artifacts (durable across sessions)
- `openspec/changes/<change-id>/feature_list.json` is the end-to-end feature checklist (pass/fail per stable ref tag).
  - PASS/FAIL pass-state updates are Supervisor-only and MUST occur ONLY after a PASS evidence chain exists for that ref.
- `openspec/changes/<change-id>/progress.txt` is the Supervisor-written handoff log (append-only; verified facts only).

# Single Codex command constant (maintain ONLY ONE copy)
CODEX_CMD = codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium

Inputs:
- change-id: $ARGUMENTS

Goal:
- Execute a BATCH LOOP over `openspec/changes/<change-id>/tasks.md`.
- Process tasks sequentially (top-to-bottom).
- For each unchecked task:
  1. Isolate execution (One Task = One Subagent = One Codex Run).
  2. Retry on failure up to MAX_ATTEMPTS (default: 2).
  3. Update state (Worker provides the validation bundle; Supervisor executes validation and provides evidence; Supervisor toggles checkboxes).

- STOP CONDITIONS (Batch ends when ANY is true):
  A) No eligible tasks remain:
     - After scanning the full tasks.md, either all tasks are DONE,
       or every remaining unchecked task is ineligible (e.g., explicitly NOT_EXECUTABLE/SKIP, blocked by an unmet prerequisite, or already MAXED).

  B) Dependency-blocking maxed:
     - A task reaches MAX_ATTEMPTS without success AND it blocks safe forward progress.
     - Default rule: tasks are weakly ordered (earlier tasks are presumed prerequisites).
       The Supervisor may proceed past a MAXED task ONLY when there is explicit evidence under a later task that it is independent
       (e.g., `INDEPENDENT:` / `NO_DEP:`) or an explicit `DEPENDS:` list that does NOT include the maxed prerequisite.
     - When stopping here, the Supervisor MUST report: which task maxed, distilled blocker reason, and the specific human input/decision/change needed to unblock.

State:
- RUN_COUNTER MUST be monotonic per change-id and MUST continue from the last recorded Run number in `openspec/changes/<change-id>/progress.txt` (do not reset to 1 across sessions).

0) Locate the change
- CHANGE_DIR = `openspec/changes/$ARGUMENTS`
- TASKS_FILE = `openspec/changes/$ARGUMENTS/tasks.md`
- FEATURE_FILE = `openspec/changes/$ARGUMENTS/feature_list.json`
- PROGRESS_FILE = `openspec/changes/$ARGUMENTS/progress.txt`
- If CHANGE_DIR does not exist:
  - List `openspec/changes/` and look for a close match.
  - If ambiguous, STOP and ask the user for the exact change-id.
- If TASKS_FILE does not exist:
  - STOP and ask the user to scaffold it.
- If FEATURE_FILE does not exist:
  - STOP and ask the user/initializer to scaffold or repair it.
  - NOTE: Worker/Codex is NOT allowed to create or rewrite feature_list.json.
- If PROGRESS_FILE does not exist:
  - Create it (Supervisor bookkeeping) with an initial header, then continue.
  - NOTE: Only do this when the file is missing (first run). Never overwrite or reset an existing progress.txt.


0.1) Restore session state (Supervisor; Read-only; no Bash)
- Read PROGRESS_FILE and derive RUN_COUNTER (monotonic per change-id):
  - If any prior entry contains `Run: #<n>`, set RUN_COUNTER = (max n) + 1
  - Else RUN_COUNTER = 1
- Read FEATURE_FILE (context only; do not edit).
- Proceed to task selection.

1) Batch session loop (one invocation = many task attempts, serial)
- Loop:
  - Read TASKS_FILE and select CURRENT_TASK using the eligibility rules in 1.1 (top-to-bottom).
  - If no eligible task exists -> STOP via stop condition (A) "No eligible tasks remain".

  - For CURRENT_TASK, run a per-task retry loop up to MAX_ATTEMPTS:
    - Let MAX_ATTEMPTS = 2 (or the configured constant in this command).
    - Let ATTEMPT be derived from PROGRESS_FILE (resumable across sessions; see 1.1).
    - While ATTEMPT <= MAX_ATTEMPTS:
      - Spawn EXACTLY ONE new subagent for this ONE task attempt (never bundle).
      - Supervisor verifies + books (explicit control flow; keep auto-retries):
        - Determine post-subagent status UNDER THIS task only:
          - READY_TO_VALIDATE if:
            - tasks.md contains exactly ONE `BUNDLE (RUN #<RUN_COUNTER>): ...` line for this attempt, and
            - the referenced run-folder exists and contains the required bundle assets (task.md + run.sh + run.bat + logs/; and if GUI/MIXED, tests/ with MCP runbook).
          - BLOCKED if tasks.md contains `BLOCKED:` + `NEEDS:` under this task.
          - ROLE_VIOLATION if the Worker wrote any `EVIDENCE (RUN #...)` / PASS/FAIL/RESULT/validated= conclusion, toggled any checkbox, or modified feature_list.json.
          - NO_PROGRESS otherwise.

        - If READY_TO_VALIDATE:
          - Execute validation as Supervisor:
            - CLI scope: run `auto_test_openspec/**/run.sh|run.bat` and capture logs/outputs (append-only in the run-folder).
            - GUI/MIXED scope:
              - run.* is start-server only (start the service and print URL/port),
              - execute `tests/gui_runbook_*.md` via MCP service `playwright-mcp` (no manual browser; no scripts),
              - capture evidence (at minimum screenshots + screenshots index under logs/; trace/video/console index optional).
          - Record result under THIS task (Supervisor-only):
            - Write ONE `EVIDENCE (RUN #<RUN_COUNTER>): ... | RESULT: PASS|FAIL | ...` line with evidence pointers.
          - If RESULT is PASS:
            - Toggle checkbox to `- [x]` (Supervisor only).
            - Append progress.txt entry (Status=DONE, Attempt=<k>, bundle + evidence pointers).
            - Continue the outer batch loop (pick next eligible task).   # explicit continue
          - If RESULT is FAIL:
            - Append progress.txt entry (Status=FAIL, Attempt=<k>, distilled blocker + evidence pointers).
            - If ATTEMPT < MAX_ATTEMPTS:
              - Add/refresh `UNBLOCK GUIDANCE (RUN #<RUN_COUNTER>): ...` under the SAME task in tasks.md (Supervisor only).
              - ATTEMPT += 1 and retry the SAME task with a fresh subagent.  # explicit retry
            - Else (ATTEMPT == MAX_ATTEMPTS):
              - Mark the task as MAXED (Supervisor note under task; do NOT check it):
                - `MAXED (RUN #<RUN_COUNTER>): <short reason>`
              - Enforce dependency-blocking stop logic:
                - If the Supervisor cannot safely proceed to any later unchecked task:
                  - STOP via stop condition (B) and report the required human unblock input.  # explicit stop
                - Else:
                  - Continue the outer batch loop.  # explicit continue
        - If BLOCKED / ROLE_VIOLATION / NO_PROGRESS:
          - Append progress.txt entry (Status=BLOCKED/ROLE_VIOLATION/NO_PROGRESS, Attempt=<k>, distilled blocker + next-step suggestion).
          - If ATTEMPT < MAX_ATTEMPTS:
            - Add/refresh `UNBLOCK GUIDANCE (RUN #<RUN_COUNTER>): ...` under the SAME task in tasks.md (Supervisor only).
            - ATTEMPT += 1 and retry the SAME task with a fresh subagent.   # explicit retry
          - Else (ATTEMPT == MAX_ATTEMPTS):
            - Mark the task as MAXED (Supervisor note under task; do NOT check it):
              - `MAXED (RUN #<RUN_COUNTER>): <short reason>`
            - Enforce dependency-blocking stop logic:
              - If the Supervisor cannot safely proceed to any later unchecked task:
                - STOP via stop condition (B) and report the required human unblock input.  # explicit stop
              - Else:
                - Continue the outer batch loop.  # explicit continue

- Terminate ONLY via stop conditions (A) or (B) (and "All tasks done" as a subset of A).
- Do NOT stop after a single task by default.

    1.1) Determine CURRENT_TASK (eligible + resumable attempts)
    - Read TASKS_FILE.
    - Scan tasks top-to-bottom and pick the FIRST unchecked checkbox item that is ELIGIBLE.
      - ELIGIBLE means ALL are true:
        - It is not explicitly marked NOT_EXECUTABLE / SKIP (by a Supervisor note under the task).
        - It is not already MAXED (i.e., previously reached MAX_ATTEMPTS without success).
        - It is not blocked by an earlier unmet prerequisite:
          - Default: tasks are weakly ordered; earlier unchecked/maxed tasks are presumed prerequisites.
          - Exception (allowed to proceed): the candidate task has an explicit independence marker under it
            (e.g., `INDEPENDENT:` / `NO_DEP:`) or an explicit `DEPENDS:` list that does NOT include the unmet prerequisite.
    - If no eligible unchecked task exists after the full scan:
      - Stop via "No eligible tasks remain" (stop condition A).

    - Capture:
      - TASK_LINE = the full checkbox line
      - TASK_NUM = e.g., `1.1` if present, else `?`
      - REF_TAG = e.g., `[#R1]` if present, else `[]`

    - Derive ATTEMPT counter for this task (resumable across sessions):
      - Read PROGRESS_FILE and find prior RUN entries where `Task: <task-num>` matches TASK_NUM.
      - Let ATTEMPT = (max recorded Attempt for this TASK_NUM) + 1, else 1 if none exist.
      - Note: Attempt is per-task (not per-session). RUN_COUNTER remains global monotonic.

    - Lock scope (per-task atomicity):
      - For the duration of the upcoming subagent/Codex run, the Worker MUST work ONLY on this CURRENT_TASK.
      - After the subagent returns, the Supervisor may select the next eligible task and spawn a new subagent.

  1.2) Print RUN banner (START)
  Output exactly:
  `[MONITOR] RUN #<RUN_COUNTER> START | change=$ARGUMENTS | task=<TASK_NUM> | ref=<REF_TAG> | text="<TASK_LINE>"`

  1.3) Spawn ONE subagent for CURRENT_TASK
  Use the Task tool to spawn a NEW subagent (e.g., name it "codex-worker").
- The Supervisor MUST NOT run Bash for implementation work (coding/build steps).
- The Supervisor MAY run Bash ONLY for:
  - executing the validation bundle entrypoint (`auto_test_openspec/**/run.sh|run.bat`) to capture auditable outputs/logs
  - minimal Git bookkeeping after PASS (commit + show/diffstat), as explicitly allowed in `allowed-tools`
  - any GUI steps MUST be executed ONLY via MCP service `playwright-mcp` (no manual browser; no Python/Node/Playwright scripts).
  
  IMPORTANT: Explicitly instruct the subagent that manual file editing is banned. 
  Tell the subagent: "I will reject any work that does not produce a `codex exec` execution log. Do not try to edit files directly."

Subagent instructions (copy verbatim):
---
You are the CODEX CLI OPERATOR. Your ONLY job is to run Codex CLI exactly once and report results. You are NOT a software engineer.

MISSION: You must force the `codex` CLI tool to perform the work.
NON-NEGOTIABLE RULE: You are FORBIDDEN from using `Write`, `Edit`, or `Replace` tools on project files. You have NO permission to edit code manually.
TOOLS:
- You MAY use the Read tool to inspect files (tasks.md / progress.txt / feature_list.json).
- You MUST invoke the Bash tool exactly once, and that single invocation MUST be CODEX_CMD.
- You are FORBIDDEN from using Write/Edit/Replace on project files.

Execution Steps (Do exactly this):
1. Read (Read tool, not Bash):
   - `openspec/changes/$ARGUMENTS/tasks.md`
   - `openspec/changes/$ARGUMENTS/progress.txt`
   - `openspec/changes/$ARGUMENTS/feature_list.json`
2. Construct a prompt for the CLI using the template below.
3. Run exactly ONE Bash command:
   codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium "$(cat <<'PROMPT'<INLINE_PROMPT>PROMPT)"
4. Verify the CLI updated `tasks.md` under THIS task ONLY (no checkbox toggles).
   Verify the Worker output is BUNDLE-ready (and ONLY bundle-ready):
   - Under THIS task, there is EXACTLY ONE single-line `BUNDLE (RUN #<RUN_COUNTER>): ...` pointer that targets a concrete run-folder:
     - includes `CODEX_CMD=codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium`
     - includes `SCOPE: <CLI|GUI|MIXED>`
     - includes `VALIDATION_BUNDLE: auto_test_openspec/$ARGUMENTS/run-.../`
     - includes `HOW_TO_RUN: run.sh/run.bat`
     - if SCOPE includes GUI: includes `RUNBOOK: tests/gui_runbook_*.md`
   - The referenced run-folder exists and contains at minimum:
     - `task.md`, `run.sh`, `run.bat`,
     - `logs/worker_startup.txt` (mandatory startup snapshot),
     - and (when GUI/MIXED) `tests/` containing an MCP-only runbook (no scripts).
   - The Worker did NOT:
     - write any `EVIDENCE (RUN #...)` line
     - write PASS/FAIL/RESULT/validated= conclusions
     - toggle any checkbox
   Also verify governance constraints:
   - `feature_list.json` MUST NOT be modified by the Worker (neither entries nor pass-state).
   - No git commit is expected/allowed from the Worker.
   - If the CLI violated any of the above, report failure.

<INLINE_PROMPT> Template (fill variables):

(Shared setup)
- change-id: $ARGUMENTS
- include the exact TASK_LINE text (verbatim)
- state explicitly: "Implement ONLY this task (no other tasks, no refactors outside scope)."
- require full validation per the task’s `TEST:` and the canonical spec:
  - Follow `openspec/project.md` → `## tasks.md Checklist Format` → `### Validation bundle requirements (mandatory)`
  - Produce a human-reproducible validation bundle under:
    `auto_test_openspec/$ARGUMENTS/<run-folder>/`
  - Worker MAY run quick local checks to ensure the bundle is runnable,
    but MUST NOT claim PASS/FAIL/validated (Supervisor is the final verifier).

A) Worker deliverables (validation bundle assets)
- Create a NEW run-folder (append-only; never overwrite prior runs):
  `auto_test_openspec/$ARGUMENTS/run-<RUN4>__task-<TASK_ID>__ref-<REF>__<YYYYMMDDThhmmssZ>/`
- Minimum required files inside the run-folder:
  - `task.md` (self-sufficient README; includes How-to-run + machine-decidable pass/fail criteria)
  - `run.sh` and `run.bat`
  - `logs/worker_startup.txt` (MANDATORY; see Startup ritual below)
  - `logs/` (for provenance + transcripts; keep append-only within this run folder)
  - If SCOPE includes GUI/MIXED: `tests/gui_runbook_*.md` (MCP-only runbook; no executable browser scripts)
  - If the task needs inputs/expected: include `inputs/`, `expected/`, and write outputs into `outputs/` (never temp dirs)
- GUI/MIXED server-start contract (MANDATORY):
  - `task.md` MUST include a dedicated section with EXACT, copy/paste-able commands:
    - `SERVER_START:` <exact command to start the server>
    - `SERVER_URL:` <exact URL Supervisor should navigate to, including host + port>
    - `READY_CHECK:` <a concrete readiness check (endpoint or observable signal)>
  - For GUI/MIXED, `run.sh` / `run.bat` MUST implement `SERVER_START`:
    - MUST start the local server and print the `SERVER_URL` to stdout.
    - MUST NOT perform validation (no PASS/FAIL claims); start-server only.

- Environment isolation (mandatory ONLY if env problems occur):
  - DO NOT install Python deps globally.
  - If missing deps / conflicts prevent execution, create an isolated venv via `uv` inside THIS run folder
    (e.g., `<run-folder>/.venv/`) and ensure `run.sh`/`run.bat` uses it.
  - Log provenance into `logs/` (always): python path+version, uv version, dependency source, exact install commands.
A) Startup ritual (MANDATORY, before any edits)
- REQUIRE CodeX STARTUP RITUAL:
  - read `openspec/changes/$ARGUMENTS/progress.txt`
  - read `openspec/changes/$ARGUMENTS/feature_list.json`
  - run `git log --oneline -20`
  - capture `GIT_BASE` via `git rev-parse --short HEAD`
  - write a Startup snapshot to the validation bundle (NOT tasks.md), at:
    - `auto_test_openspec/$ARGUMENTS/<run-folder>/logs/worker_startup.txt`
  - The snapshot MUST include: UTC timestamp, CODEX_CMD, GIT_BASE, the git-log excerpt, and a short “what I observed” summary.

B) tasks.md bookkeeping (Worker-owned; single-line; NO conclusions)
- require Codex to update `openspec/changes/$ARGUMENTS/tasks.md` under THIS task with exactly ONE Worker bookkeeping line (NOT EVIDENCE):
  - starting with: `BUNDLE (RUN #<RUN_COUNTER>): ...`
  - MUST be a SINGLE LINE
  - MUST NOT write any `EVIDENCE (RUN #...)` line
  - MUST NOT write any PASS/FAIL/RESULT/validated= conclusions
- The single BUNDLE line MUST include ONLY:
  - `CODEX_CMD=codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium`
  - `SCOPE: <CLI|GUI|MIXED>`
  - `VALIDATION_BUNDLE: auto_test_openspec/$ARGUMENTS/run-<RUN4>__task-<TASK_NUM>__ref-<REF>__<YYYYMMDDThhmmssZ>`
  - `HOW_TO_RUN: run.sh/run.bat`
  - (if SCOPE=GUI or MIXED) `RUNBOOK: tests/gui_runbook_*.md`
  - (if SCOPE=GUI or MIXED) `SERVER_URL: <exact url including host+port>`
- forbid Codex from toggling ANY checkbox in tasks.

C) GUI hard rules (only if SCOPE includes GUI/MIXED)
- GUI verification is Supervisor-only via MCP service `playwright-mcp`.
- Worker deliverable for GUI is ONLY the MCP runbook file:
  - `tests/gui_runbook_*.md` MUST be MCP-only steps + selectors + assertion points + evidence capture points.
  - ABSOLUTELY NO executable browser automation scripts (no Playwright test runner; no Python/Node scripts).
  - ABSOLUTELY NO manual browser steps anywhere (no “open Chrome/click …” prose, anywhere in the bundle).
- For GUI/MIXED bundles, `run.sh` / `run.bat` MUST be start-server only:
  - MUST start the local server and print URL/port.
  - MUST NOT perform state seeding/copying/exporting/testing/validation/probing/installs.

D) Governance boundaries (Worker forbidden; Supervisor-only)
- feature_list governance (MANDATORY; strict):
  - The Worker/Codex is FORBIDDEN to edit `openspec/changes/$ARGUMENTS/feature_list.json` (no entry edits, no pass-state edits, no formatting churn).
  - If `openspec/changes/$ARGUMENTS/feature_list.json` is missing OR the matching ref entry is missing:
    - Under THIS task write:
      BLOCKED: Missing feature_list.json (or missing ref entry for <REF_TAG>)
      NEEDS: Supervisor/initializer must create/repair feature_list.json (structure + ref mapping). Then re-run this task.
    - Then END THIS WORKER RUN immediately (do not proceed with implementation in this run).
  - Pass-state updates (e.g., `passes=true/false`) are Supervisor-only and may occur ONLY after Supervisor validation PASS + EVIDENCE is recorded.
- forbid touching any other tasks (no evidence elsewhere; no changes to other items)
- governance boundary (Worker/Codex; mandatory):
  - The Worker/Codex is FORBIDDEN to create git commits (no checkpoint commits).
  - The Worker/Codex is FORBIDDEN to edit or append `git_openspec_history/<change-id>/runs.log`.
  - The Worker/Codex MUST NOT attempt to produce DIFFSTAT/FILES “final” summaries as evidence.
  - All commit/runs.log bookkeeping (and DIFFSTAT capture) is Supervisor-only and may occur ONLY after Supervisor validation PASS.

3) After Codex finishes, confirm that `openspec/changes/$ARGUMENTS/tasks.md` has either:

BUNDLE-READY (Worker output, under THIS task):
  - EXACTLY ONE `BUNDLE (RUN #<RUN_COUNTER>): ...` line that points to a concrete run-folder:
    - includes `VALIDATION_BUNDLE: auto_test_openspec/$ARGUMENTS/run-.../`
    - includes `HOW_TO_RUN: run.sh/run.bat`
    - if SCOPE includes GUI/MIXED: includes `RUNBOOK: tests/gui_runbook_*.md`
    - if SCOPE includes GUI/MIXED: includes `SERVER_URL: ...`
  - The referenced run-folder exists and contains at minimum:
    - `task.md`, `run.sh`, `run.bat`, `logs/worker_startup.txt`,
    - and (when GUI/MIXED) `tests/` with an MCP-only runbook
  - For GUI/MIXED, `task.md` MUST include `SERVER_START:` + `SERVER_URL:` + `READY_CHECK:` (as defined above).
  - Worker MUST NOT have written any `EVIDENCE (RUN #...)` line.
  - Worker MUST NOT have toggled any checkbox.
  - Worker MUST NOT have edited feature_list.json.
  - Worker MUST NOT have created any git commit.
  - Worker MUST NOT have edited `git_openspec_history/<change-id>/runs.log`.

OR BLOCKED (Worker output, under THIS task):
  - `BLOCKED: ...` (1–5 line error excerpt)
  - `NEEDS: ...` (next concrete unblock step)

OR ROLE_VIOLATION (Worker output, under THIS task):
  - Any `EVIDENCE (RUN #...)` / PASS/FAIL/RESULT/validated= conclusion, checkbox toggle, feature_list.json edit, git commit,
    or any edit/append to `git_openspec_history/<change-id>/runs.log`.

Otherwise treat as NO_PROGRESS (missing BUNDLE line and/or missing run-folder).

1.4) Supervisor verification after subagent returns
- Re-read TASKS_FILE.
- Determine status (under THIS task only):

  - READY_TO_VALIDATE if a compliant BUNDLE (RUN #<RUN_COUNTER>) line exists and the referenced run-folder is present and well-formed.
  - BLOCKED if BLOCKED+NEEDS exists.
  - ROLE_VIOLATION if Worker wrote any EVIDENCE/PASS/FAIL/RESULT/validated= conclusion, toggled checkboxes, edited feature_list.json, created commits,
    or edited/appended `git_openspec_history/<change-id>/runs.log`.
  - NO_PROGRESS otherwise.

- If READY_TO_VALIDATE:
  - Supervisor MUST execute validation.
    - CLI: via `run.sh`/`run.bat` as specified in the bundle.
    - GUI/MIXED:
      1) MUST start the server first by running `run.sh`/`run.bat` (start-server only).
      2) MUST navigate using the `SERVER_URL` provided in the BUNDLE line / task.md.
      3) Then execute the MCP `playwright-mcp` runbook.
      4) If the server cannot be started or `SERVER_URL` is missing/invalid, treat as bundle not ready for validation (NO_PROGRESS or BLOCKED with NEEDS), not as a feature FAIL.
  - Supervisor writes the single EVIDENCE (RUN #<RUN_COUNTER>) line (PASS/FAIL + evidence pointers).
  - Supervisor updates feature_list.json pass-state ONLY after PASS.
  - Supervisor creates ONE checkpoint commit ONLY after PASS.
  - Supervisor appends runs.log ONLY after PASS.
  - Supervisor may then toggle the checkbox to - [x] ONLY after PASS.

- DONE is reachable only after Supervisor validation PASS + compliant EVIDENCE exists under THIS task.

If DONE:
- Toggle the checkbox to `- [x]` (Supervisor only).
- Append a FULL RUN ENTRY to PROGRESS_FILE (Supervisor only; verified facts only) including:
  - RUN SUMMARY (timestamp, run #, change-id, task/ref, status)
  - Evidence pointers (tasks.md evidence line pointer + feature_list passes change + GIT_BASE/GIT_COMMIT/COMMIT_MSG)
  - Validation commands/steps + 3–15 lines output excerpt (from Supervisor validation output and/or bundle logs)
  - Changes verified: FILES/DIFFSTAT + key edits summary
  - [DIALOGUE + TOOL TRACE] with bracket markers, including:
    - [Supervisor → Subagent] instruction
    - [Tool Use] <task - spawn subagent>
    - [Tool Use] <bash - CODEX_CMD "..."> (from subagent trace)
    - [Subagent] reported outputs + the exact BUNDLE line + bundle folder pointer(s)
    - [Supervisor] the exact EVIDENCE line + acceptance decision + rationale
- Print RUN banner (END) as before.
- RUN_COUNTER += 1 and continue/stop per your session policy.

If BLOCKED:
- Ensure actionable NEEDS exists (next concrete unblock step).

- Call skill `openspec-unblock-research` (Supervisor-only). Do NOT call MCP tools directly here.
  - Provide the skill the BLOCKED context (task line + ref, error excerpt, NEEDS, what was tried, env/versions if known).
  - Instruct the skill to write its portable research capsule into BOTH bookkeeping artifacts:
    (a) Under THIS task in tasks.md:
        Add `UNBLOCK GUIDANCE (RUN #<RUN_COUNTER>):` containing:
        - Query terms
        - Key conclusions
        - Evidence pointers (source links/locators)
        - Executable next steps + how to verify
    (b) Into progress.txt (inside the current RUN entry):
        Append a short “Unblock Research Capsule” containing:
        - Query terms
        - Key conclusions
        - Evidence pointers
        - Pointer back to the tasks.md UNBLOCK GUIDANCE location

- Append a FULL RUN ENTRY to PROGRESS_FILE capturing blocker + the skill’s capsule + retry decision (verified facts only).
- Retry once as before; if blocked again, STOP and require user/initializer intervention.

If NO_PROGRESS:
- Treat as a FAILED ATTEMPT (not an immediate session stop by default).
- Under THIS task, append/refresh a single diagnostic note:
  `BLOCKED: Missing a compliant BUNDLE pointer and/or the referenced validation bundle folder is missing/incomplete for this RUN (workflow non-compliance).`
  `NEEDS: Re-run SAME task; Worker/Codex must (1) create a fresh run-folder under auto_test_openspec/<change-id>/... containing task.md + run.sh + run.bat + logs/worker_startup.txt (+ tests/runbook if GUI), and (2) append EXACTLY ONE single-line BUNDLE (RUN #<RUN_COUNTER>) pointer under THIS task (CODEX_CMD + SCOPE + VALIDATION_BUNDLE + HOW_TO_RUN [+ RUNBOOK]).`
- Append a FULL RUN ENTRY to PROGRESS_FILE (status=NO_PROGRESS) including:
  - the missing-gate diagnosis,
  - the subagent trace,
  - Attempt #k and the retry/maxed decision.
- Flow control MUST follow the per-task retry policy:
  - If Attempt #k < MAX_ATTEMPTS: continue the retry loop for the SAME task (fresh subagent).
  - Else (Attempt #k == MAX_ATTEMPTS): mark the task MAXED and apply dependency-blocking stop logic (stop only if it blocks safe forward progress).

2) Completion (only at start-of-session, or if CURRENT_TASK selection finds none)
- If no unchecked tasks remain:
  `[MONITOR] DONE | change=$ARGUMENTS | all tasks checked`
  then STOP.


流程

初始配置

首先是安装Claude code和codex,这个就不列举了。安装openspec这里要说一下,最好是0.19.0版本,因为再新的版本,openspec的工作流重构了,支持自然语言调用,使用的是skills触发3,后续我也会尝试适配更新最新版本的openspec。

npm install -g @fission-ai/openspec@0.19.0

先使用openspec初始化一下项目

openspec init
下一步操作 - 将这些提示复制到codex:
────────────────────────────────────────────────────────────
1. 填充项目上下文:
请阅读 openspec/project.md 并协助我完成内容填写
包含我的项目详情、技术栈及规范"

2. 创建您的首个变更提案:
我想添加[在此处填写您的功能]。请创建一个
OpenSpec 对此功能的变更提案

3. 学习 OpenSpec 工作流:
请解释来自 openspec/AGENTS.md 的 OpenSpec 工作流。
以及我该如何与你共同推进这个项目

重复流程

先打开codex,使用自然语言提出一个变更提案,例如:为我这个项目添加一个支持夜间模型自动切换的功能

然后再使用skills$openspec-change-interviewer <id>让模型通过采访的方式,明确我们的需求,对齐需求。填写的是openspec文件夹下的当前提案的文件夹名称。

$openspec-feature-list <id> 让模型列出来一个feature_list.json。

最后打开Claude code,输入/monitor-openspec-codex <id>即可

实际使用流程

  1. 安装 openspec(我建议锁 0.19.0

    我这里强烈建议用 0.19.0,因为更高版本工作流有重构,虽然也支持自然语言调用,但走的是 skills ,我后续也会尝试适配到最新版3

    npm install -g @fission-ai/openspec@0.19.0
  2. 初始化项目

    openspec init

    初始化完成后,它会提示下一步要做什么。我们可以先把项目上下文补齐,再创建第一个变更提案。

  3. codex 提一个 change(自然语言就行)

    比如:为我这个项目添加一个支持夜间模型自动切换的功能

  4. 用 skill 把需求“采访清楚”

    对齐需求这一步真的很值。我们让模型先问清楚,再开干,后面返工会少很多。

    • 运行 openspec-change-interviewer$openspec-change-interviewer <id>
    • <id> 就是 openspec 文件夹下当前提案的文件夹名
  5. 生成 feature_list.json

    • 跑:$openspec-feature-list <id>
    • 这一步做完,我们后面就能用它来防止“看似完成、实际没过”的情况
  6. 开始监督执行:交给 Claude Code

    最后打开 Claude Code,输入:

    • /monitor-openspec-codex <id>

参考资料

最后修改:2026 年 02 月 03 日
如果您觉得本文还不错,欢迎前往 爱发电支持我