让 AI 一直跑又不跑偏,真的太难了
使用Claude code、codex这类工具久了,有时候就挺想让他们一直运行下去。但又怕他们自己写代码写偏,而且长时间运行可能还会导致模型上下文爆了。针对这个需求,我设计了一套Claude code监督codex的工作流。
今天就把这套思路分享给大家。这不仅仅是个方案,更是一种思路,大家完全可以拿去改成适合自己的版本。
特别提醒:这个思路适合从 1 到 n 的迭代开发。如果是 0 到 1 的新项目,我还是建议大家自己动手,或者亲自盯着模型做。
选对工具,省钱又省心
我自己的情况是:有 ChatGPT Plus,有 codex 使用权限,同时还有 glm 的 coding plan lite(可以配置到 Claude Code 里用)。Gemini 我也有,但 Gemini cli 的体验我个人觉得一般,所以这里就用 Claude Code + codex 来演示。
总结一下就是:
glm的 coding plan:额度多,我基本没碰到过限额Claude Code:有时会出现 过早完成任务 的情况codex:相对更稳一点,但模型更贵,要省着用
所以我这里的策略是:让 Claude Code 来充当监督者,让 codex 去干活儿。
关于 codex 模型,我建议用 ChatGPT-5.2-medium。带 codex 后缀的模型官方说的是专门针对编程和代理任务优化1但我实际测下来干活效果不太理想。medium 类似“Auto”,你也可以选 high,但是不要选 Xhigh,我之前试过,效果是真好,但一天跑完了一周的额度,钱包真的受不住。
两层防跑偏保险
这套 workflow 里,我最在意的是“防跑偏”和“防作弊”。
所以我用了两个东西做双保险,一个是tasks.md一个是feature_list.json,主要对比如下:
1. 对比表格
| 特性 | tasks.md | feature_list.json |
|---|---|---|
| 核心定位 | 执行层:具体的实施步骤与验证过程 | 管理层:产品功能需求的最终状态 |
| 颗粒度 | 细粒度:一个功能可能拆分为多个任务(1.1, 1.2, 1.3) | 粗粒度:一个 Ref ID 对应一个完整功能点(R1) |
| Worker 权限 | 部分写入:仅允许添加 BUNDLE 行(交付代码包路径) | 完全禁止:禁止修改任何内容(严禁自作主张改需求或状态) |
| Supervisor 权限 | 管理执行:勾选 Checkbox,写入 EVIDENCE(通过/失败结论) | 更新状态:仅在验证通过后,将 passes 字段改为 true |
| 内容形态 | Markdown:包含人类可读的指令、测试标准、运行日志路径 | JSON:结构化数据,包含 Ref ID、描述、布尔值状态 |
| 生命周期 | 动态交互:随着每次运行不断追加日志、报错、重试记录 | 相对静态:只有在功能真正“做完且验过”时才会翻转状态 |
| 给人类+AI Agent | 主要给AI Agent |
2. 作用与联系
各自的作用
tasks.md(过程):它是过程记录。它记录了从代码实现到最终验证的完整流水线。Worker 可以在这里犯错、重试(Attempt #1, #2...),Supervisor 在这里记录具体的验证命令和截图路径。它是人机协作的作业空间,容纳了试错与迭代的细节,确保过程的可追溯性。
feature_list.json(结果):它是验收基准。它不记录具体的开发曲折,只映射最终的交付状态。负责 哪些端到端能力已经真正验过并通过 ,它用稳定 ref 来做长期清单,默认全部 passes=false,只有当某个 ref 的 PASS 证据链已经存在时才允许更新为通过。
靠什么联系起来?
两者通过 Ref 标签(如 [#R1]) 进行刚性绑定:
- 映射关系:
tasks.md中的具体任务行会携带标签(例如- [ ] 1.1 实现登录接口 [#R1]),这个标签直接对应feature_list.json中的"ref": "R1"条目。 状态流转(单向驱动):
- 先在
tasks.md验证:Supervisor 必须先在tasks.md中运行 Worker 提供的代码包,确认测试通过,并写入EVIDENCE ... RESULT: PASS。 - 后在
feature_list.json归档:只有当tasks.md里的证据链确凿无疑(PASS)后,Supervisor 才有权限去修改feature_list.json中对应R1的passes字段为true。
- 先在
为什么要这么死板?因为只靠一份任务清单,模型是可能“看起来完成了”,但实际没完成;而 feature_list.json 这种能让我们更容易发现它是不是在糊弄。某种意义上,它就是防止“做个样子但不可用”的那道门槛2。
另外,为了最大程度减少“需求没对齐就开干”,我还加了一个 skills,让 AI 能反问我们,把需求再确认一遍。
总体思路
[角色分工] Claude Code 充当监督者(Supervisor),Codex 则是工人(Worker)。
因为真正怕的不是它不会写代码,而是:
- 它觉得“自己做完了”,但其实只是做了个样子
- 它偷懒绕过验证,或者验证不可复现
- 它跑偏了还自信满满,最后我们接手的时候一地鸡毛
所以这里使用两个 Agent 进行工作,最大程度的防止作弊,一个只负责写、一个只负责验收。
[启动] 整个流程开始于我使用 Codex (工人)生成的一份 OpenSpec 变更提案,这些提案会被转化为 tasks.md 中具体的待办事项列表。每当需要执行一项新任务时,Claude Code (监督者)就会启动一个subagent,使用codex exec调用 Codex (工人)。然后使用自然语言调用 OpenSpec。OpenSpec 最好是0.19.0版本,因为再新的版本 OpenSpec 的工作流重构了,也支持自然语言调用,但使用的是skills触发3。
[执行与交付] Codex (工人)在写完代码后,它必须制作并交付一个可复现的测试方案作为完工凭证并放在auto_test_openspec 目录下:
- CLI 任务: 包内必须包含自动化测试脚本(run.sh)。
- GUI 任务: 包内必须包含一份不含可执行代码的 MCP 操作方式(Markdown 格式),以及仅用于启动服务的脚本。
[验收与确权] Claude Code (监督者)会亲运行脚本进行验收,对于 GUI 任务,它会严格按照剧本调用 playwright-mcp 服务驱动浏览器,并抓取截图作为铁证,确保功能不仅代码写了,而且真实可用2。
只有当 Claude Code (监督者)亲自确认测试方案运行通过,且手中的证据链完整无误时,它才会执行一系列 确权 操作:
- 在 tasks.md 中勾选任务。
- 更新 feature_list.json 的 pass 状态。
- 执行 Git 提交存档。
- 将包含证据指针的交接日志写入 progress.txt。
[异常处理] 如果中遇到技术卡点, Claude Code (监督者)会利用 Context7 或浏览器搜索工具自主寻找解决方案并指导执行者重试。
目录结构
.
├── auto_test_openspec/ # [根目录衍生品] 不可变的证据仓库
│ ├── run-0001__task-1.1__ref-R1.../ # 具体某次任务的“验证包” (Run Folder)
│ │ ├── run.sh # 自动化复现脚本
│ │ ├── task.md # 验证操作手册
│ │ └── ... # (日志、截图、输入输出等)
│ └── ...
│
├── git_openspec_history/ # [根目录衍生品] Git 提交索引
│ └── runs.log # 索引日志:回溯 Run ID <-> Git Commit SHA
│
└── openspec/
└── changes/
└── <change-id>/ # [OpenSpec 变更内产物]
├── feature_list.json # 特性清单与通过状态 (双重账本)
├── progress.txt # 交接日志 (记录对话与验证结果)
└── tasks.md # (任务列表源文件)每个任务单独的一个subagent,这样做是可以保证上下文不会过长和污染。但记忆则确保不了,我的方案是。
1. 核心机制:“启动仪式” (The Startup Ritual)
要求 Codex(工人)在干活前必须先读取历史档案:
- 必须读取
openspec/changes/<change-id>/progress.txt和feature_list.json。 - 必须运行
git log --oneline -20来获取最近的代码变更历史。 - 必须把读到的这些信息写进
auto_test_openspec/$ARGUMENTS/<run-folder>/logs/worker_startup.txt,证明“我看过以前发生什么了”。
- 必须读取
2. 三个记忆文件
tasks.md作为项目的“任务记忆”与唯一事实来源,它维护着所有任务的执行状态清单。 Claude Code (监督者)通过读取此文件来决定当前的派发逻辑,而 Codex (工人)则依靠它明确具体的实施目标,从而确保双方对 哪些任务已完成、哪些待执行 拥有一致的认知。progress.txt这是一个只增不减的“过程记忆”日志,用于在不同会话间传递交接信息。每当任务结束, Claude Code (监督者)会将对话摘要、验证结果及报错信息固化至此;新启动的 Codex (工人)必须通过查阅该文件中的历史记录(特别是失败或阻塞的原因),来汲取前车之鉴,从而避免重蹈覆辙。feature_list.json它是项目完成度的状态,专门记录各个功能模块的验证通过状态。在该机制下,Codex (工人)仅拥有读取权限以确认依赖项状态,只有在 Claude Code (监督者)完成严格验证后才会更新此文件,从而保证了关于项目整体可用性的记忆既连续又具备绝对的权威性2。
Skills和mcp配置
1. 配置 MCP
如果你的任务涉及 GUI(或者 MIXED),我强烈建议加 playwright-mcp。因为我们想做到的是:Supervisor 不靠手动点页面,也不靠脚本跑 Playwright,而是通过 MCP 驱动浏览器并采集证据(截图、日志等)。
playwright-mcp:
claude mcp add --transport stdio --scope user playwright-mcp -- npx -y @playwright/mcp@latest再配一个 context7(遇到卡点能查资料、补上下文):
claude mcp add context7 -- npx -y @upstash/context7-mcp@latest我这里浏览器搜索 MCP 用的是智普的(你也可以换别家的,只要名字对得上就行):
claude mcp add -s user -t http web-search-prime https://open.bigmodel.cn/api/mcp/web_search_prime/mcp --header "Authorization: Bearer your_api_key"claude mcp add -s user -t http web-reader https://open.bigmodel.cn/api/mcp/web_reader/mcp --header "Authorization: Bearer your_api_key"2. skills
这几个 skill 我是直接放在仓库里维护的,大家可以按需下载:
给 codex 用的:
- 建议大家去 GitHub 下载 openspec-change-interviewer(用 采访式反问 把需求对齐)
- 再去 GitHub 下载 openspec-feature-list(生成
feature_list.json)
给 Claude Code 用的:
- 这个是 Supervisor 卡点用的研究:建议大家去 GitHub 下载 openspec-unblock-research
1. 配置mcp server
在 Claude Code 中运行 mcp list。必须看到 mcp__<new-search-name>__* 和 mcp__github__* (或其他辅助工具) 均已加载。
2. 修改核心文件 (SKILL.md)
对 openspec-unblock-research 的 SKILL.md 进行两处关键修改:
1. 修改文件头部 Description
保持描述与实际工具一致。
- 把 `mcp__web-search-prime__*`
- 改为 `mcp__<new-search-name>__*`
2. 修改 Default Provider Ordering
在文件底部的列表里 **插入新工具** 并 **替换旧搜索**。
修改示例:
## Default provider ordering (if caller omits toolchain_config)
1. `mcp__context7__*` (authority source)
...
2. `mcp__github__*` (新增: internal authority)
- Use for: checking existing issues/bugs in repo or upstream.
- Trigger when: `error_excerpt` looks like a library bug.
- Stop when: found a closed issue matching symptoms.
3. `mcp__<new-search-name>__*` (替换原有的 search-prime)
- Use for: recent regressions, common pitfalls.
- Trigger when: `error_excerpt` includes searchable strings.
- Stop when: have candidate links to verify.
4. `mcp__web-reader__*` (evidence fetcher)
...需要更改的文件
可选:规范代码
修改AGENT.md。这个主要目的是为尽量写的代码规范一点精简一点,属于个人喜好,当然你也可以配置一下其他的,比如必须使用uv虚拟环境等等。大家如果觉得没必要的话可以不加
## Code hygiene guardrails (always-on)
- Prioritize correctness and maintainability over cosmetic changes.
- Keep scope tight: don’t refactor unrelated areas; avoid “while I’m here” edits.
- Write for the next reader: choose clear names, straightforward control flow, and readable structure.
- Avoid clever compactness (dense one-liners, nested ternaries). Prefer if/else or switch when branching grows.关键文件修改
为了让这套流程跑起来,我们需要覆盖或新建几个配置文件。
openspec-proposal.md需要添加的
位置:
- Windows:
%USERPROFILE%\.codex\prompts\openspec-proposal.md - macOS/Linux:
~/.codex/prompts/openspec-proposal.md
目的:让openspec生成的task.md比较符合我们的需求。
注:该文件必须在输入openspec init后修改,否则会默认重置掉。- When drafting `openspec/changes/<id>/tasks.md`, you MUST follow:
- `openspec/project.md` → `## tasks.md Checklist Format` (canonical; do not invent a parallel format).
- Hard gate reminders (do not expand here; see canonical spec above):
- Every task MUST include `ACCEPT:` and `TEST:`.
- Every checkbox task line MUST include EXACTLY ONE `[#R<n>]` token, unique across the file.
- `TEST:` MUST include `SCOPE: CLI|GUI|MIXED` and MUST enable a human-reproducible validation bundle
(all bundle rules + role split + evidence rules live ONLY in `openspec/project.md`).
- Role split (mandatory; see `openspec/project.md` → “Validation bundle requirements”):
- Worker produces bundle assets only; Supervisor executes and records PASS/FAIL evidence.
- GUI/MIXED constraint (mandatory; see `openspec/project.md` → “CLI/GUI/MIXED validation requirements”):
- GUI verification must be driven via MCP service `playwright-mcp` and evidence must be archived; do NOT use any browser automation scripts (Python/Node/Playwright test runner).项目目录:openspec\project.md
目的:让openspec生成的task.md比较符合我们的需求。
project.md末尾添加
## tasks.md Checklist Format
This section is the SINGLE canonical spec for tasks.md format and validation bundles.
Do not duplicate this spec elsewhere; other docs must link here.
### Task Line Format (required)
Each checkbox task line MUST follow:
- `- [ ] <task-id> <task summary> [#R<n>]`
- `<task-id>` MUST be dot-numbered (e.g. `1.1`, `2.3`).
- Each checkbox line MUST include EXACTLY ONE `[#R<n>]` token (e.g. `[#R1]`).
- `[#R<n>]` MUST be unique across the entire tasks.md (never reuse).
- Every task MUST include both `ACCEPT:` and `TEST:` blocks.
- `TEST:` MUST include `SCOPE: CLI|GUI|MIXED` and MUST be implementable into a validation bundle
per `### Validation bundle requirements (mandatory)` below.
### Example (copy/paste)
- [ ] 1.1 Do X and produce Y [#R1]
- ACCEPT: ...
- TEST: SCOPE: CLI
- When done, generate validation bundle under:
auto_test_openspec/<change-id>/<run-folder>/
- run-folder MUST be:
run-<RUN4>__task-<task-id>__ref-<ref-id>__<YYYYMMDDThhmmssZ>/
- Run: auto_test_openspec/<change-id>/<run-folder>/run.sh (macOS/Linux) or run.bat (Windows)
- run-folder MUST be:
run-<RUN4>__task-<task-id>__ref-<ref-id>__<YYYYMMDDThhmmssZ>/
- Run: auto_test_openspec/<change-id>/<run-folder>/run.sh (macOS/Linux) or run.bat (Windows)
- Inputs: inputs/sample.json
Outputs: outputs/result.json
- Verify: compare against expected/result.json (or rule-based assertions)
### Validation bundle requirements (mandatory)
For every task, `TEST:` MUST be written so:
- the Worker can produce a **human one-click reproducible** validation bundle (assets + scripts for CLI checks; GUI checks are MCP-driven and MUST NOT use any browser automation scripts),
- AND the Supervisor can execute it and record the final PASS/FAIL evidence chain
(each run-folder is immutable; evidence pointers are written after execution).
0) Roles & responsibilities (mandatory)
- Worker (produces artifacts; not the final verifier):
- Implement product code + write tests (CLI). For GUI/MIXED, produce an MCP runbook only (no executable browser automation scripts).
- Produce the validation bundle assets under the run-folder:
`task.md`, `run.sh`, `run.bat`, `tests/` (CLI tests and/or GUI MCP runbook; no executable browser scripts), and (when applicable) `inputs/`, `expected/`.
- MUST NOT declare PASS/FAIL.
- MUST NOT overwrite/edit prior run-folders (append-only history).
- Supervisor (executes validation; forms the evidence chain):
- MUST create a brand-new run-folder for every validation attempt (never overwrite).
- Executes `run.sh` / `run.bat`, captures `outputs/` + `logs/` + GUI evidence when applicable.
- MUST write the final PASS/FAIL result + evidence pointers (this is the DONE hard gate).
1) Canonical on-disk location (repo root; append-only)
- Root folder (fixed):
- `auto_test_openspec/<change-id>/`
- Each validation attempt MUST create a brand-new run folder (never overwrite; keep ALL history forever):
- `auto_test_openspec/<change-id>/<run-folder>/`
- Once created, a run folder MUST be treated as immutable evidence:
- do not edit prior runs; create a new run folder instead.
2) Run folder naming (required; MUST include run#, task-id, ref-id; timestamp recommended)
- `<run-folder>` MUST follow this exact pattern:
- `run-<RUN4>__task-<task-id>__ref-<ref-id>__<YYYYMMDDThhmmssZ>/`
- Example:
- `run-0007__task-1.1__ref-R1__20260111T031500Z/`
- Rules:
- `<RUN4>`: zero-padded, monotonic run counter (e.g. 0001, 0002, ...).
- MUST match the Supervisor workflow RUN_COUNTER / `EVIDENCE (RUN #n)` numbering for audit alignment.
- Mapping rule: `RUN #7` => `run-0007`, `RUN #12` => `run-0012`.
- `<task-id>`: dot-numbered task id from the checkbox line (e.g. `1.1`).
- `<ref-id>`: stable ref id derived from the task tag (e.g. `[#R1]` → `R1`).
- `<YYYYMMDDThhmmssZ>`: UTC timestamp to guarantee uniqueness and ease auditing.
3) Minimum required contents inside EVERY run folder
Each run folder MUST contain at least:
A) `task.md` (this run’s readme; MUST be self-sufficient)
task.md MUST include:
- change-id, run#, task-id, ref-id
- SCOPE covered (CLI / GUI / MIXED)
- How to run (Windows + macOS/Linux)
- CLI: run.sh/run.bat executes CLI checks.
- GUI/MIXED: run.sh/run.bat starts the service only; GUI steps are executed via the MCP runbook under tests/.
- Test inputs (if any): input file paths, params, sample data
- Test outputs (if any): what files/stdout/stderr/screenshots/logs will be produced and where
- Expected results (machine-decidable): pass/fail criteria
- exit code checks
- stdout/stderr assertions (required when relevant)
- file existence/content assertions (required when outputs exist)
- GUI assertion points (when GUI/MIXED): which screenshots/states prove correctness
- Hard rules (GUI/MIXED):
- task.md MUST NOT contain manual browser steps (no “open Chrome/click buttons” prose).
- task.md MUST point to the MCP-only runbook under tests/ (e.g., tests/gui_runbook_<topic>.md).
- Any required “copy/seed/prepare input/state” steps MUST be written as exact commands/steps here (and referenced by the runbook). run.sh/run.bat MUST NOT perform them.
- Provenance of expected/assumptions:
- If inputs/expected are not provided by a human, the Worker MUST generate them and document where they came from
(e.g., derived from ACCEPT, or an explicit reasonable assumption).
B) One-click scripts (both required; GUI/MIXED = start-server only)
- run.sh (macOS/Linux)
- run.bat (Windows)
Script requirements (all bundles):
- Must assume the default dev machine environment is ready.
- Non-destructive:
- MUST NOT modify global environment
- MUST NOT globally install dependencies
- MUST NOT write to system directories
- Must be runnable from ANY working directory:
- the script MUST cd/pushd to its own directory first, then resolve paths from there.
Hard rule (when SCOPE includes GUI):
- run.sh/run.bat MUST be start-server only:
- MUST: start the local service and print the access URL/port (e.g., http://127.0.0.1:<PORT>/)
- MUST NOT: copy/overwrite data files, mutate state/inputs, generate exports/outputs, run tests, run exports, probe/install dependencies, or perform environment probes (python/uv version checks do NOT belong in GUI start scripts)
- Any required “copy/seed/prepare input/state” steps MUST be documented as exact commands/steps in task.md (and referenced by tests/gui_runbook_*.md) for the Supervisor to execute and record in EVIDENCE.
For CLI bundles (or the CLI portion of MIXED):
- run.sh/run.bat SHOULD print key results to console and SHOULD write logs to logs/.
- Environment provenance SHOULD be documented as optional preflight commands in task.md (not forced into GUI start scripts), e.g.:
- interpreter path + version (Python/Node if used)
- uv --version when Python/uv is involved
- When provenance is executed, it SHOULD be recorded to logs/.
C) Test asset folders (create the ones that apply)
- `logs/` MUST exist (always):
- run logs, env/version info, command transcript, GUI screenshot index, etc.
- `tests/` MUST exist when:
- SCOPE includes GUI (MCP-driven via `playwright-mcp`), OR
- validation is not fully expressible as simple CLI assertions.
- `inputs/` MUST exist when the task involves file input (see I/O hard rule below).
- `outputs/` MUST exist when the validation produces file outputs (see I/O hard rule below).
- `expected/` SHOULD exist when golden-file comparison is used; otherwise rule-based assertions are acceptable.
4) Hard rule: “input file + output file + output validation”
If the task validation is “given an input produces an output” in ANY form:
- `inputs/` MUST contain at least one reproducible input sample.
- `run.*` MUST write the real produced outputs into `outputs/` (never into random temp/system dirs).
- The bundle MUST include at least one machine-decidable verification method (pass/fail), typically:
- (A) golden file compare against `expected/` (exact match OR documented allowed-diff rules), and/or
- (B) rule-based assertions (e.g. JSON schema, key fields, row counts, regex match, exit code, forbidden strings).
`task.md` MUST explicitly describe:
- what the input is
- what output is produced
- what “expected” means
- and exactly how the script validates it
5) CLI / GUI / MIXED validation requirements
- If SCOPE includes CLI:
- MUST run the real CLI command(s) in `run.*`
- MUST check exit code
- MUST assert key stdout/stderr content (or absence of known-bad patterns)
- If files are produced: MUST use `outputs/` + `expected/` and/or rule assertions as above
- If SCOPE includes GUI:
- The validation bundle MUST provide an MCP-only GUI verification runbook
(stored under tests/ and executed by the Supervisor via playwright-mcp; do NOT use any scripts to drive the browser).
- Hard rule: run.sh/run.bat MUST be start-server only for GUI/MIXED bundles:
- MUST: only start the service and print URL/port
- MUST NOT: copy/seed/prepare input/state, generate exports/outputs, run tests, or perform environment probes
- Any required data prep steps MUST be written as exact commands/steps in task.md (and referenced by the runbook).
- Supervisor execution constraint (mandatory):
- GUI verification MUST be driven via MCP service playwright-mcp
- no manual browser interaction
- no Python/Node/Playwright scripts to drive the browser
- Must archive auditable evidence artifacts (append-only; never overwrite):
- at minimum: screenshots (e.g., outputs/screenshots/ plus a screenshots index file in logs/)
- recommended: trace/video and a console log index when available from MCP (paths recorded in logs/)
- If SCOPE is MIXED:
- The bundle MUST cover both CLI and GUI checks (either in one test file or split; see “two test files” rule below).
6) Allowing two test files (when needed; organization rule)
Default: one test file should cover key acceptance points.
Two test files are allowed / recommended when:
- CLI + GUI are both involved:
- one test focuses on CLI
- one runbook focuses on GUI (MCP steps + assertions; no executable browser scripts)
- Same entrypoint but two distinct paths must be covered:
- happy path + error/edge path (e.g., valid vs invalid args)
- GUI needs both “functional flow” and “render/state”:
- split into two smaller, more stable tests
Suggested naming under the run folder:
- `tests/test_cli_<topic>.*`
- `tests/gui_runbook_<topic>.md` (MCP-only steps + assertion points; no executable browser scripts)
Note:
- “two test files” refers to validation assets under `tests/` (CLI test scripts and/or GUI MCP runbook).
- The “input/output two files + validation” rule refers to runtime data under `inputs/outputs/expected` and is additive, not conflicting.
7) Environment isolation (uv venv rule; mandatory when env problems occur)
- Under no circumstances may the Worker “pollute global Python env” to make validation pass (e.g., global `pip install`).
- If the Worker encounters environment problems (missing deps, conflicts, cannot run):
- MUST create an isolated venv using `uv`
- Recommended location: inside THIS run folder (e.g. `<run-folder>/.venv/` or `<run-folder>/venv/`)
- All installs/runs must occur inside that venv
- `run.*` and/or `logs/` MUST clearly record:
- which interpreter is used
- uv version
- where dependencies came from (lockfile / pyproject / etc.)
- Note:
- Creating a venv is conditional (only when env problems occur),
but running the full validation bundle is unconditional (always required).
8) tasks.md bookkeeping lines (mandatory; role split; no duplicated rules elsewhere)
- Under the task entry in `openspec/changes/<change-id>/tasks.md`, TWO lines are mandatory:
- Worker-written (bundle-ready; NO PASS/FAIL):
- `BUNDLE (RUN #n): ... | VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder> | HOW_TO_RUN: run.sh/run.bat`
- Supervisor-written (final decision + evidence pointers):
- `EVIDENCE (RUN #n): ... | VALIDATED: <exact commands + exit code> | RESULT: PASS|FAIL | GUI_EVIDENCE: <paths when applicable>`
- Worker MUST NOT claim PASS/FAIL anywhere; Supervisor is the only role that records PASS/FAIL after running the bundle.项目目录:.\claude.md
目的:明确Claude code的任务身份、工作流。
claude.md
# CLAUDE.md (OpenSpec + Codex Supervisor)
You are the SUPERVISOR (Claude Code). Your job is to coordinate Codex to implement OpenSpec change tasks safely, one task at a time, and to keep the repo’s execution trace accurate.
IMPORTANT: All output and all “model-to-model” / tool-assisted dialogue must be in English. Do not produce Chinese text.
## Source of truth
- `openspec/changes/<change-id>/tasks.md` is the single source of truth for implementation progress.
- Do not use `TODO.md` for this workflow. Do not invent tasks outside `tasks.md`.
## Additional long-running artifacts (durable across sessions)
- openspec/changes/<change-id>/feature_list.json is the durable end-to-end feature checklist.
- One entry per stable ref tag (e.g., [#R1] in tasks.md maps to "ref": "R1" in JSON).
- Default all features to failing (passes=false) until validated.
- Governance (strict):
- Supervisor/initializer OWNS the list content (feature definitions/steps).
- Worker is FORBIDDEN to add/remove/rewrite feature entries.
- Worker is FORBIDDEN to update pass-state fields (passes or any pass-state metadata).
- Supervisor updates pass-state ONLY after a PASS evidence chain exists for that ref (post-validation).
- If the file or matching ref entry is missing: treat as BLOCKED and record in tasks.md; do NOT scaffold or invent entries.
- openspec/changes/<change-id>/progress.txt is the Supervisor-written handoff log.
- Append-only. One RUN entry per task attempt (one subagent / one Codex run).
- A single /monitor-openspec-codex ... invocation MUST append at most ONE RUN entry (no batch loop by default).
- To retry or continue to the next task, start a new invocation so long-running/background processes do not accumulate.
- Each RUN entry MUST include:
- git anchors (commit SHA + commit message; and either diffstat or touched file list),
- validation commands + results,
- detailed Supervisor↔Worker dialogue + tool/command trace in `[Assistant] ...` / `[Tool Use] ...` style for replay/audit.
- Must reflect only verified facts (no aspirational claims).
- `git_openspec_history/<change-id>/runs.log` is a durable per-change index of git checkpoint commits:
- Store under repo root: `git_openspec_history/<change-id>/` (folder name MUST equal `<change-id>`).
- Append-only log: `git_openspec_history/<change-id>/runs.log` (one line per successful RUN linking run# → commit → diffstat/files).
- `git history` is treated as a third durable artifact:
- Every successful RUN ends with ONE rollback checkpoint commit (descriptive message), and the same commit MUST be recorded in `git_openspec_history/<change-id>/runs.log`.
## Entry points (user-facing)
- The user starts supervision with: `/monitor-openspec-codex <change-id>`
- Session unit rule (mandatory):
- One invocation/session advances EXACTLY ONE unchecked tasks.md checkbox item.
- State restoration across sessions relies on: progress.txt + feature_list.json + git history
+ git_openspec_history/<change-id>/runs.log.
## Worker invocation (Codex CLI)
# Single Codex command constant (maintain ONLY ONE copy)
CODEX_CMD = codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium
How it works:
- Supervisor composes a single English prompt that targets ONE tasks.md checkbox item.
- Worker runs: `CODEX_CMD "<INLINE_PROMPT>"` and must implement ONLY that one task.
- Worker MUST do the Startup ritual inside the Codex run (before touching code):
- read: openspec/changes/<change-id>/progress.txt + feature_list.json (+ tasks.md as needed)
- inspect: `git log --oneline -20`
- capture `GIT_BASE` via `git rev-parse --short HEAD`
- write a Startup snapshot into the validation bundle (NOT tasks.md), at:
- `auto_test_openspec/<change-id>/<run-folder>/logs/worker_startup.txt`
- MUST include (at minimum): UTC timestamp, CODEX_CMD, GIT_BASE, the `git log --oneline -20` excerpt, and a short “what I observed” summary.
- NOTE: Do NOT write STARTUP/GIT_BASE fields into tasks.md. Supervisor may cite this file path later in EVIDENCE.
- Worker MUST NOT toggle any tasks.md checkbox. Supervisor owns checkboxes.
- Worker MUST NOT edit feature_list.json (neither entries nor pass-state).
- Worker MUST NOT create git commits.
- Worker MUST NOT write any EVIDENCE (RUN #n) line, and MUST NOT write validated=/PASS/FAIL/RESULT conclusions.
- Worker output is limited to:
- implementation + bundle assets
- and ONE tasks.md bookkeeping line:
- BUNDLE (RUN #n): ... | VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder> | HOW_TO_RUN: run.sh/run.bat | (if GUI) RUNBOOK: tests/gui_runbook_<topic>.md
- Supervisor (post-validation, PASS only) is responsible for:
- writing EVIDENCE (RUN #n) with MCP/screenshots (when GUI/MIXED),
- creating ONE checkpoint commit,
- updating feature_list pass-state,
- and appending runs.log (if applicable).
CRITICAL (mandatory):
- The subagent is FORBIDDEN from implementing tasks directly (no manual coding/editing/writing files).
- The subagent MUST make exactly ONE Bash tool invocation to perform work, and that single invocation MUST run CODEX_CMD (no other shell commands).
- Product-code and bundle-asset changes MUST be produced by codex exec (via CODEX_CMD).
- Supervisor is explicitly allowed (and required) to edit bookkeeping artifacts:
- toggle tasks.md checkboxes, write EVIDENCE (RUN #n) lines, append progress.txt, and create ONE checkpoint commit on PASS.
- Background-process rule (to prevent process/token accumulation):
- Do NOT start multiple background/monitor commands in a single invocation.
- If any long-running process was started (e.g., a server), terminate it before starting a new attempt.
Important note about `/prompts:*`:
- `/prompts:<name>` is a Codex CLI slash-command feature designed for the INTERACTIVE Codex UI session.
- Do NOT rely on `/prompts:*` in automated non-interactive runs (`codex exec`). Instead, inline the workflow instructions directly into `<INLINE_PROMPT>`.
## Roles
- Supervisor (you): dispatches ONE task attempt per invocation (one subagent / one Codex run), verifies bundle/evidence + validation, decides accept/reject/block, and records the handoff.
- Within a single /monitor-openspec-codex ... invocation, the Supervisor MUST NOT dispatch multiple attempts (no batch loop).
- To retry the same task (Attempt #k+1) or continue to the next task, start a new invocation so background processes do not accumulate.
- Supervisor is the ONLY role allowed to toggle checkboxes in `tasks.md`.
- Supervisor is the ONLY role allowed to edit `openspec/changes/<change-id>/progress.txt` (append-only).
- Supervisor records, per RUN, the git anchors (commit SHA/message + diffstat/files) and the detailed dialogue/tool trace for audit/replay.
- Worker (Codex via CODEX_CMD): coding agent for ONE task only.
- MUST perform Startup ritual at the beginning of EVERY run (progress.txt + feature_list.json + `git log --oneline -20` + `git rev-parse --short HEAD`)
and write what was observed into the validation bundle log:
- `auto_test_openspec/<change-id>/<run-folder>/logs/worker_startup.txt` (mandatory)
- MUST implement + write tests (CLI) + produce the validation bundle assets (task.md/run.sh/run.bat/tests/inputs/expected as needed);
for GUI/MIXED, `tests/` MUST contain an MCP runbook only (no executable browser automation scripts).
- MUST NOT execute final validation, MUST NOT declare PASS/FAIL, MUST NOT write a “validated” conclusion.
- Supervisor: executes validation and forms the final evidence chain.
- Runs `auto_test_openspec/<change-id>/<run-folder>/run.sh|run.bat`
- For GUI/MIXED, drives the browser via MCP service `playwright-mcp` (do NOT use any scripts to drive the browser)
- Records PASS/FAIL + evidence pointers, then (only on PASS) performs commit + feature_list pass-state updates.
- MUST NOT toggle any checkbox in `tasks.md`.
- MUST NOT edit `openspec/changes/<change-id>/progress.txt`.
- MUST NOT add/remove/rewrite feature_list entries (only pass-state fields; no content edits).
- Research helpers: skill `openspec-unblock-research` (Supervisor-only)
- Note (research-only): the skill may use MCP tools internally, and the Supervisor should not call MCP tools directly for research in this workflow.
- Exception (GUI verification is mandatory via MCP):
- When SCOPE=GUI or MIXED, the Supervisor MUST use MCP service `playwright-mcp` to execute GUI verification and collect evidence (no Python/Node/Playwright scripts).
## Task selection rules (tasks.md)
- Pick the FIRST ELIGIBLE unchecked checkbox item (`- [ ] ...`) in `openspec/changes/<change-id>/tasks.md` (top-to-bottom).
- ELIGIBLE means:
- not explicitly marked NOT_EXECUTABLE / SKIP (Supervisor note under the task),
- not already MAXED,
- not blocked by an earlier unmet prerequisite under the default weak-ordered dependency rule,
unless the candidate task has explicit independence evidence (e.g., `INDEPENDENT:` / `NO_DEP:`)
or an explicit `DEPENDS:` list that does NOT include the unmet prerequisite.
- Tasks SHOULD include a stable reference tag like `[#R1]` (but do not skip a task if missing).
- One task = one subagent = one worker run. Never do multiple tasks in a single run.
## Verification + bookkeeping rules
After the worker finishes a task:
1) Re-open `openspec/changes/<change-id>/tasks.md`.
2) Supervisor is the ONLY role allowed to change any checkbox (`- [ ]` → `- [x]`).
- Worker/Codex MUST NOT toggle checkboxes.
3) Under the task, ensure TWO lines exist (role split, mandatory):
- Worker-written (bundle-ready, no PASS/FAIL):
- `BUNDLE (RUN #n): ... | VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder> | HOW_TO_RUN: run.sh/run.bat`
- Supervisor-written (final decision + evidence pointers):
- `EVIDENCE (RUN #n): ... | VALIDATED: <exact commands + exit code> | RESULT: PASS|FAIL | GUI_EVIDENCE: <screenshots/trace/video/console index paths>`
- Prefer this format (SINGLE LINE, THIS TASK ONLY):
EVIDENCE (RUN #n): CODEX_CMD=codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium
| SCOPE: <CLI|GUI|MIXED>
| VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder>
| WORKER_STARTUP_LOG: auto_test_openspec/<change-id>/<run-folder>/logs/worker_startup.txt
| VALIDATED_CLI: <exact command(s)> | EXIT_CODE: <n> (omit if no CLI)
| VALIDATED_GUI: MCP(playwright-mcp) | RUNBOOK: tests/<.> | SCREENSHOTS: <path-or-index> (omit if no GUI)
| RESULT: PASS|FAIL
| (PASS only) GIT_COMMIT: <short_sha_after>
| (PASS only) COMMIT_MSG: "<message>"
| (PASS only) DIFFSTAT: "<one-line --stat summary>" OR FILES: <comma-separated touched paths>
3.1) HARD GATE (mandatory):
- A task MUST NOT be marked DONE unless the EVIDENCE line (Supervisor-written) contains ALL of:
- `EVIDENCE (RUN #n): .` # 明确是哪一次 run
- `SCOPE: CLI|GUI|MIXED`
- `VALIDATION_BUNDLE: auto_test_openspec/<change-id>/<run-folder>/`
- `WORKER_STARTUP_LOG: auto_test_openspec/<change-id>/<run-folder>/logs/worker_startup.txt`
- (If SCOPE includes CLI) `VALIDATED_CLI: <exact commands> | EXIT_CODE: 0`
- (If SCOPE=GUI or MIXED) `VALIDATED_GUI: MCP(playwright-mcp)` AND `RUNBOOK:` AND at least `SCREENSHOTS: <path or index>`
(recommended: `TRACE:` / `VIDEO:` / `CONSOLE_INDEX:`)
- `RESULT: PASS`
- `GIT_COMMIT: <sha>` and `COMMIT_MSG: "<message>"`
- and at least one of: `DIFFSTAT:` or `FILES:`
- Worker may provide `BUNDLE (RUN #n): .` but it is NOT sufficient for DONE.
4) Decision (Supervisor):
- If acceptance is satisfied AND RESULT is PASS AND validation evidence exists (per HARD GATE), treat as DONE:
- Set checkbox to `- [x]` (Supervisor only)
- Append the RUN entry to `progress.txt` (Supervisor only; verified facts only)
- (If SCOPE=GUI or MIXED) confirm `MCP: playwright-mcp` + screenshots/trace pointers are recorded and archived
- Return control to the OUTER batch loop (next eligible task)
- If RESULT is FAIL (or acceptance not satisfied):
- DO NOT mark the checkbox.
- Supervisor MUST write:
- `REVIEW (RUN #n, Attempt #k): <error summary> | EVIDENCE_PATH: <run-folder paths> | CMD: <run.* + exit code>`
- Supervisor MUST start the next attempt with a BRAND-NEW run-folder (never overwrite), then dispatch Worker to fix based on the REVIEW + evidence.
- Do NOT “one-off stop” or “only retry once” here.
Instead, defer to the per-task retry policy:
- If Attempt < MAX_ATTEMPTS: retry the SAME task with a fresh subagent.
- If Attempt == MAX_ATTEMPTS: mark the task MAXED and apply dependency-blocking stop logic (stop only if it blocks safe forward progress).
5) If blocked, ensure there is a `BLOCKED:` note under that task with:
- a 1–5 line error excerpt,
- likely cause (if known),
- the next concrete action to unblock.
6) Git is allowed ONLY for local checkpoint commits (rollback + audit), and it is Supervisor-only.
Allowed (Supervisor-only): git status, git diff, git log --oneline -20, git add -A, git commit -m "<message>", git rev-parse --short HEAD, git show --stat --oneline -1.
Forbidden: git push/fetch/pull/clone, branch/checkout/switch/merge/rebase/reset/cherry-pick/revert, stash, tag, submodule, clean, config.
Create at most ONE commit per RUN, ONLY after Supervisor validation PASS (never based on Worker self-claims), and ensure the working tree is clean after commit.
## progress.txt format (Supervisor, append-only)
File: openspec/changes/<change-id>/progress.txt
Rule: Append-only. Never rewrite or reorder existing entries.
Each RUN entry MUST contain:
A) A structured RUN SUMMARY (fast scanning)
B) A detailed DIALOGUE + TOOL TRACE (replay / audit)
================================================================================
RUN ENTRY
[RUN SUMMARY]
Timestamp (UTC): <ISO-8601 Z> Run: #<n> Attempt: <k>
Change: <change-id> Task: <task-num> Ref: <ref-tag>
Status: DONE | FAIL | BLOCKED | ROLE_VIOLATION | NO_PROGRESS
Git anchors (this RUN):
- (PASS-only) Commit: <short_sha> "<commit message>"
- (PASS-only) Diffstat (short): <1 line> OR Files: <comma-separated touched paths>
- (If not PASS) Commit anchors may be absent; do NOT invent them.
Evidence pointers:
- tasks.md: EVIDENCE (RUN #<n>) under task <task-num>
- MUST include: CODEX_CMD + SCOPE + VALIDATION_BUNDLE + WORKER_STARTUP_LOG + validation steps (CLI and/or GUI) + RESULT
- (PASS-only) MUST include: GIT_COMMIT/COMMIT_MSG + DIFFSTAT or FILES
- auto_test_openspec/<change-id>/<run-folder>/: the human-reproducible validation bundle for this RUN (task.md + run scripts + assets + outputs/logs, including logs/worker_startup.txt)
- feature_list.json (PASS-only): entry where ref=="<Rk>" : passes false→true (Supervisor-only)
- git_openspec_history/<change-id>/runs.log (PASS-only): must record the same checkpoint commit for this RUN (commit SHA/message + diffstat/files)
- git history (PASS-only): the commit above is the rollback checkpoint for this RUN
--------------------------------------------------------------------------------
Optional (recommended) SESSION STARTUP ENTRY (once per session)
[SESSION STARTUP]
[Assistant] I'll start by getting my bearings and understanding the current state of the project.
[Tool Use] <read - openspec/changes/<id>/progress.txt>
[Tool Use] <read - openspec/changes/<id>/feature_list.json>
[Tool Use] <read - openspec/changes/<id>/tasks.md>
[Assistant] Let me check the git log to see recent work.
[Tool Use] <bash - CODEX_CMD "..."> (Codex run contains `git log --oneline -20` as part of STARTUP)
[Subagent] <paste the git log excerpt that Codex recorded under THIS task or in the EVIDENCE/STARTUP note>
[Assistant] <what looks healthy / what is next>
================================================================================
## Blocker handling (with research skill)
If a task is blocked:
- When BLOCKED (or repeated NO_PROGRESS), do not call MCP tools directly; always use `openspec-unblock-research` to perform research and produce unblock guidance.
- The skill may use MCP tools (e.g. `web-search-prime`, `context7`, etc.) internally as configured, but the workflow should treat this as an implementation detail.
- Under the SAME task in `tasks.md`, add/refresh:
`UNBLOCK GUIDANCE (RUN #n, Attempt #k): ...`
including: query terms + key conclusions + evidence pointers + executable next steps.
- Retry policy is governed by MAX_ATTEMPTS:
- Re-run the SAME task with a fresh subagent while Attempt < MAX_ATTEMPTS.
- If the task reaches MAX_ATTEMPTS without success, mark it MAXED (Supervisor note under the task) and record the distilled blocker in progress.txt.
- Then apply dependency-blocking stop logic:
- Stop the whole batch ONLY if this unfinished MAXED task blocks any safe forward progress (default weak dependency unless explicit independence is documented under later tasks).
- Otherwise, later tasks explicitly marked independent may proceed.
## Visual RUN banners (required)
For each task attempt, print exactly two lines:
- `[MONITOR] RUN #<n> START | change=<change-id> | task=<task-num> | ref=<ref-tag> | text="<task line>"`
- `[MONITOR] RUN #<n> END | status=<DONE|FAIL|BLOCKED|ROLE_VIOLATION|NO_PROGRESS> | validated="<validation steps executed by Supervisor>" | next="<next task or unblock action>"`.claude/commands/monitor-openspec-codex.md (自动化核心)
在
- Windows:
%USERPROFILE%\.claude\commands - macOS/Linux:
~/.claude/commands
下新建:monitor-openspec-codex.md
这是我们的“监工脚本”,它定义了 Claude Code 如何自动循环调用 Codex。
monitor-openspec-codex.md
---
description: Supervise an OpenSpec change in BATCH MODE. Iterates through unchecked tasks.md items sequentially via Codex CLI (codex exec). Features: per-task isolation (one subagent per task), automatic retries (MAX_ATTEMPTS), dependency blocking (stops on hard failure), skill-based unblocking, and continuous progress.txt logging.
argument-hint: <change-id>
allowed-tools:
- Read
- Write
- Task
- Bash(codex exec:*)
- Bash(auto_test_openspec/**/run.sh)
- Bash(auto_test_openspec/**/run.bat)
# Minimal FS (Supervisor-only; to create bookkeeping dirs/files deterministically)
- Bash(mkdir:*)
# Minimal Git (Supervisor-only, bookkeeping after PASS; avoids “background monitoring” workarounds)
- Bash(git rev-parse:*)
- Bash(git status:*)
- Bash(git log:*)
- Bash(git add:*)
- Bash(git commit:*)
- Bash(git show:*)
- Bash(git diff:*)
---
You are the SUPERVISOR. Follow this procedure in English only.
# Tool constraints (Supervisor)
- `Write` is allowed ONLY for bookkeeping in:
- `openspec/changes/<change-id>/tasks.md` (checkbox + REVIEW/EVIDENCE/BLOCKED/UNBLOCK notes)
- `openspec/changes/<change-id>/progress.txt` (append-only handoff log)
- `openspec/changes/<change-id>/feature_list.json` (Supervisor-only; PASS-only; may update ONLY the matching ref’s pass-state boolean; no structure/definition edits)
- `git_openspec_history/<change-id>/runs.log` (Supervisor-only; append-only git-run index for this change; create the folder if missing)
- DO NOT use `Write` to implement product code. All implementation MUST come from the Worker’s single `CODEX_CMD` run.
# Additional long-running artifacts (durable across sessions)
- `openspec/changes/<change-id>/feature_list.json` is the end-to-end feature checklist (pass/fail per stable ref tag).
- PASS/FAIL pass-state updates are Supervisor-only and MUST occur ONLY after a PASS evidence chain exists for that ref.
- `openspec/changes/<change-id>/progress.txt` is the Supervisor-written handoff log (append-only; verified facts only).
# Single Codex command constant (maintain ONLY ONE copy)
CODEX_CMD = codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium
Inputs:
- change-id: $ARGUMENTS
Goal:
- Execute a BATCH LOOP over `openspec/changes/<change-id>/tasks.md`.
- Process tasks sequentially (top-to-bottom).
- For each unchecked task:
1. Isolate execution (One Task = One Subagent = One Codex Run).
2. Retry on failure up to MAX_ATTEMPTS (default: 2).
3. Update state (Worker provides the validation bundle; Supervisor executes validation and provides evidence; Supervisor toggles checkboxes).
- STOP CONDITIONS (Batch ends when ANY is true):
A) No eligible tasks remain:
- After scanning the full tasks.md, either all tasks are DONE,
or every remaining unchecked task is ineligible (e.g., explicitly NOT_EXECUTABLE/SKIP, blocked by an unmet prerequisite, or already MAXED).
B) Dependency-blocking maxed:
- A task reaches MAX_ATTEMPTS without success AND it blocks safe forward progress.
- Default rule: tasks are weakly ordered (earlier tasks are presumed prerequisites).
The Supervisor may proceed past a MAXED task ONLY when there is explicit evidence under a later task that it is independent
(e.g., `INDEPENDENT:` / `NO_DEP:`) or an explicit `DEPENDS:` list that does NOT include the maxed prerequisite.
- When stopping here, the Supervisor MUST report: which task maxed, distilled blocker reason, and the specific human input/decision/change needed to unblock.
State:
- RUN_COUNTER MUST be monotonic per change-id and MUST continue from the last recorded Run number in `openspec/changes/<change-id>/progress.txt` (do not reset to 1 across sessions).
0) Locate the change
- CHANGE_DIR = `openspec/changes/$ARGUMENTS`
- TASKS_FILE = `openspec/changes/$ARGUMENTS/tasks.md`
- FEATURE_FILE = `openspec/changes/$ARGUMENTS/feature_list.json`
- PROGRESS_FILE = `openspec/changes/$ARGUMENTS/progress.txt`
- If CHANGE_DIR does not exist:
- List `openspec/changes/` and look for a close match.
- If ambiguous, STOP and ask the user for the exact change-id.
- If TASKS_FILE does not exist:
- STOP and ask the user to scaffold it.
- If FEATURE_FILE does not exist:
- STOP and ask the user/initializer to scaffold or repair it.
- NOTE: Worker/Codex is NOT allowed to create or rewrite feature_list.json.
- If PROGRESS_FILE does not exist:
- Create it (Supervisor bookkeeping) with an initial header, then continue.
- NOTE: Only do this when the file is missing (first run). Never overwrite or reset an existing progress.txt.
0.1) Restore session state (Supervisor; Read-only; no Bash)
- Read PROGRESS_FILE and derive RUN_COUNTER (monotonic per change-id):
- If any prior entry contains `Run: #<n>`, set RUN_COUNTER = (max n) + 1
- Else RUN_COUNTER = 1
- Read FEATURE_FILE (context only; do not edit).
- Proceed to task selection.
1) Batch session loop (one invocation = many task attempts, serial)
- Loop:
- Read TASKS_FILE and select CURRENT_TASK using the eligibility rules in 1.1 (top-to-bottom).
- If no eligible task exists -> STOP via stop condition (A) "No eligible tasks remain".
- For CURRENT_TASK, run a per-task retry loop up to MAX_ATTEMPTS:
- Let MAX_ATTEMPTS = 2 (or the configured constant in this command).
- Let ATTEMPT be derived from PROGRESS_FILE (resumable across sessions; see 1.1).
- While ATTEMPT <= MAX_ATTEMPTS:
- Spawn EXACTLY ONE new subagent for this ONE task attempt (never bundle).
- Supervisor verifies + books (explicit control flow; keep auto-retries):
- Determine post-subagent status UNDER THIS task only:
- READY_TO_VALIDATE if:
- tasks.md contains exactly ONE `BUNDLE (RUN #<RUN_COUNTER>): ...` line for this attempt, and
- the referenced run-folder exists and contains the required bundle assets (task.md + run.sh + run.bat + logs/; and if GUI/MIXED, tests/ with MCP runbook).
- BLOCKED if tasks.md contains `BLOCKED:` + `NEEDS:` under this task.
- ROLE_VIOLATION if the Worker wrote any `EVIDENCE (RUN #...)` / PASS/FAIL/RESULT/validated= conclusion, toggled any checkbox, or modified feature_list.json.
- NO_PROGRESS otherwise.
- If READY_TO_VALIDATE:
- Execute validation as Supervisor:
- CLI scope: run `auto_test_openspec/**/run.sh|run.bat` and capture logs/outputs (append-only in the run-folder).
- GUI/MIXED scope:
- run.* is start-server only (start the service and print URL/port),
- execute `tests/gui_runbook_*.md` via MCP service `playwright-mcp` (no manual browser; no scripts),
- capture evidence (at minimum screenshots + screenshots index under logs/; trace/video/console index optional).
- Record result under THIS task (Supervisor-only):
- Write ONE `EVIDENCE (RUN #<RUN_COUNTER>): ... | RESULT: PASS|FAIL | ...` line with evidence pointers.
- If RESULT is PASS:
- Toggle checkbox to `- [x]` (Supervisor only).
- Append progress.txt entry (Status=DONE, Attempt=<k>, bundle + evidence pointers).
- Continue the outer batch loop (pick next eligible task). # explicit continue
- If RESULT is FAIL:
- Append progress.txt entry (Status=FAIL, Attempt=<k>, distilled blocker + evidence pointers).
- If ATTEMPT < MAX_ATTEMPTS:
- Add/refresh `UNBLOCK GUIDANCE (RUN #<RUN_COUNTER>): ...` under the SAME task in tasks.md (Supervisor only).
- ATTEMPT += 1 and retry the SAME task with a fresh subagent. # explicit retry
- Else (ATTEMPT == MAX_ATTEMPTS):
- Mark the task as MAXED (Supervisor note under task; do NOT check it):
- `MAXED (RUN #<RUN_COUNTER>): <short reason>`
- Enforce dependency-blocking stop logic:
- If the Supervisor cannot safely proceed to any later unchecked task:
- STOP via stop condition (B) and report the required human unblock input. # explicit stop
- Else:
- Continue the outer batch loop. # explicit continue
- If BLOCKED / ROLE_VIOLATION / NO_PROGRESS:
- Append progress.txt entry (Status=BLOCKED/ROLE_VIOLATION/NO_PROGRESS, Attempt=<k>, distilled blocker + next-step suggestion).
- If ATTEMPT < MAX_ATTEMPTS:
- Add/refresh `UNBLOCK GUIDANCE (RUN #<RUN_COUNTER>): ...` under the SAME task in tasks.md (Supervisor only).
- ATTEMPT += 1 and retry the SAME task with a fresh subagent. # explicit retry
- Else (ATTEMPT == MAX_ATTEMPTS):
- Mark the task as MAXED (Supervisor note under task; do NOT check it):
- `MAXED (RUN #<RUN_COUNTER>): <short reason>`
- Enforce dependency-blocking stop logic:
- If the Supervisor cannot safely proceed to any later unchecked task:
- STOP via stop condition (B) and report the required human unblock input. # explicit stop
- Else:
- Continue the outer batch loop. # explicit continue
- Terminate ONLY via stop conditions (A) or (B) (and "All tasks done" as a subset of A).
- Do NOT stop after a single task by default.
1.1) Determine CURRENT_TASK (eligible + resumable attempts)
- Read TASKS_FILE.
- Scan tasks top-to-bottom and pick the FIRST unchecked checkbox item that is ELIGIBLE.
- ELIGIBLE means ALL are true:
- It is not explicitly marked NOT_EXECUTABLE / SKIP (by a Supervisor note under the task).
- It is not already MAXED (i.e., previously reached MAX_ATTEMPTS without success).
- It is not blocked by an earlier unmet prerequisite:
- Default: tasks are weakly ordered; earlier unchecked/maxed tasks are presumed prerequisites.
- Exception (allowed to proceed): the candidate task has an explicit independence marker under it
(e.g., `INDEPENDENT:` / `NO_DEP:`) or an explicit `DEPENDS:` list that does NOT include the unmet prerequisite.
- If no eligible unchecked task exists after the full scan:
- Stop via "No eligible tasks remain" (stop condition A).
- Capture:
- TASK_LINE = the full checkbox line
- TASK_NUM = e.g., `1.1` if present, else `?`
- REF_TAG = e.g., `[#R1]` if present, else `[]`
- Derive ATTEMPT counter for this task (resumable across sessions):
- Read PROGRESS_FILE and find prior RUN entries where `Task: <task-num>` matches TASK_NUM.
- Let ATTEMPT = (max recorded Attempt for this TASK_NUM) + 1, else 1 if none exist.
- Note: Attempt is per-task (not per-session). RUN_COUNTER remains global monotonic.
- Lock scope (per-task atomicity):
- For the duration of the upcoming subagent/Codex run, the Worker MUST work ONLY on this CURRENT_TASK.
- After the subagent returns, the Supervisor may select the next eligible task and spawn a new subagent.
1.2) Print RUN banner (START)
Output exactly:
`[MONITOR] RUN #<RUN_COUNTER> START | change=$ARGUMENTS | task=<TASK_NUM> | ref=<REF_TAG> | text="<TASK_LINE>"`
1.3) Spawn ONE subagent for CURRENT_TASK
Use the Task tool to spawn a NEW subagent (e.g., name it "codex-worker").
- The Supervisor MUST NOT run Bash for implementation work (coding/build steps).
- The Supervisor MAY run Bash ONLY for:
- executing the validation bundle entrypoint (`auto_test_openspec/**/run.sh|run.bat`) to capture auditable outputs/logs
- minimal Git bookkeeping after PASS (commit + show/diffstat), as explicitly allowed in `allowed-tools`
- any GUI steps MUST be executed ONLY via MCP service `playwright-mcp` (no manual browser; no Python/Node/Playwright scripts).
IMPORTANT: Explicitly instruct the subagent that manual file editing is banned.
Tell the subagent: "I will reject any work that does not produce a `codex exec` execution log. Do not try to edit files directly."
Subagent instructions (copy verbatim):
---
You are the CODEX CLI OPERATOR. Your ONLY job is to run Codex CLI exactly once and report results. You are NOT a software engineer.
MISSION: You must force the `codex` CLI tool to perform the work.
NON-NEGOTIABLE RULE: You are FORBIDDEN from using `Write`, `Edit`, or `Replace` tools on project files. You have NO permission to edit code manually.
TOOLS:
- You MAY use the Read tool to inspect files (tasks.md / progress.txt / feature_list.json).
- You MUST invoke the Bash tool exactly once, and that single invocation MUST be CODEX_CMD.
- You are FORBIDDEN from using Write/Edit/Replace on project files.
Execution Steps (Do exactly this):
1. Read (Read tool, not Bash):
- `openspec/changes/$ARGUMENTS/tasks.md`
- `openspec/changes/$ARGUMENTS/progress.txt`
- `openspec/changes/$ARGUMENTS/feature_list.json`
2. Construct a prompt for the CLI using the template below.
3. Run exactly ONE Bash command:
codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium "$(cat <<'PROMPT'<INLINE_PROMPT>PROMPT)"
4. Verify the CLI updated `tasks.md` under THIS task ONLY (no checkbox toggles).
Verify the Worker output is BUNDLE-ready (and ONLY bundle-ready):
- Under THIS task, there is EXACTLY ONE single-line `BUNDLE (RUN #<RUN_COUNTER>): ...` pointer that targets a concrete run-folder:
- includes `CODEX_CMD=codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium`
- includes `SCOPE: <CLI|GUI|MIXED>`
- includes `VALIDATION_BUNDLE: auto_test_openspec/$ARGUMENTS/run-.../`
- includes `HOW_TO_RUN: run.sh/run.bat`
- if SCOPE includes GUI: includes `RUNBOOK: tests/gui_runbook_*.md`
- The referenced run-folder exists and contains at minimum:
- `task.md`, `run.sh`, `run.bat`,
- `logs/worker_startup.txt` (mandatory startup snapshot),
- and (when GUI/MIXED) `tests/` containing an MCP-only runbook (no scripts).
- The Worker did NOT:
- write any `EVIDENCE (RUN #...)` line
- write PASS/FAIL/RESULT/validated= conclusions
- toggle any checkbox
Also verify governance constraints:
- `feature_list.json` MUST NOT be modified by the Worker (neither entries nor pass-state).
- No git commit is expected/allowed from the Worker.
- If the CLI violated any of the above, report failure.
<INLINE_PROMPT> Template (fill variables):
(Shared setup)
- change-id: $ARGUMENTS
- include the exact TASK_LINE text (verbatim)
- state explicitly: "Implement ONLY this task (no other tasks, no refactors outside scope)."
- require full validation per the task’s `TEST:` and the canonical spec:
- Follow `openspec/project.md` → `## tasks.md Checklist Format` → `### Validation bundle requirements (mandatory)`
- Produce a human-reproducible validation bundle under:
`auto_test_openspec/$ARGUMENTS/<run-folder>/`
- Worker MAY run quick local checks to ensure the bundle is runnable,
but MUST NOT claim PASS/FAIL/validated (Supervisor is the final verifier).
A) Worker deliverables (validation bundle assets)
- Create a NEW run-folder (append-only; never overwrite prior runs):
`auto_test_openspec/$ARGUMENTS/run-<RUN4>__task-<TASK_ID>__ref-<REF>__<YYYYMMDDThhmmssZ>/`
- Minimum required files inside the run-folder:
- `task.md` (self-sufficient README; includes How-to-run + machine-decidable pass/fail criteria)
- `run.sh` and `run.bat`
- `logs/worker_startup.txt` (MANDATORY; see Startup ritual below)
- `logs/` (for provenance + transcripts; keep append-only within this run folder)
- If SCOPE includes GUI/MIXED: `tests/gui_runbook_*.md` (MCP-only runbook; no executable browser scripts)
- If the task needs inputs/expected: include `inputs/`, `expected/`, and write outputs into `outputs/` (never temp dirs)
- GUI/MIXED server-start contract (MANDATORY):
- `task.md` MUST include a dedicated section with EXACT, copy/paste-able commands:
- `SERVER_START:` <exact command to start the server>
- `SERVER_URL:` <exact URL Supervisor should navigate to, including host + port>
- `READY_CHECK:` <a concrete readiness check (endpoint or observable signal)>
- For GUI/MIXED, `run.sh` / `run.bat` MUST implement `SERVER_START`:
- MUST start the local server and print the `SERVER_URL` to stdout.
- MUST NOT perform validation (no PASS/FAIL claims); start-server only.
- Environment isolation (mandatory ONLY if env problems occur):
- DO NOT install Python deps globally.
- If missing deps / conflicts prevent execution, create an isolated venv via `uv` inside THIS run folder
(e.g., `<run-folder>/.venv/`) and ensure `run.sh`/`run.bat` uses it.
- Log provenance into `logs/` (always): python path+version, uv version, dependency source, exact install commands.
A) Startup ritual (MANDATORY, before any edits)
- REQUIRE CodeX STARTUP RITUAL:
- read `openspec/changes/$ARGUMENTS/progress.txt`
- read `openspec/changes/$ARGUMENTS/feature_list.json`
- run `git log --oneline -20`
- capture `GIT_BASE` via `git rev-parse --short HEAD`
- write a Startup snapshot to the validation bundle (NOT tasks.md), at:
- `auto_test_openspec/$ARGUMENTS/<run-folder>/logs/worker_startup.txt`
- The snapshot MUST include: UTC timestamp, CODEX_CMD, GIT_BASE, the git-log excerpt, and a short “what I observed” summary.
B) tasks.md bookkeeping (Worker-owned; single-line; NO conclusions)
- require Codex to update `openspec/changes/$ARGUMENTS/tasks.md` under THIS task with exactly ONE Worker bookkeeping line (NOT EVIDENCE):
- starting with: `BUNDLE (RUN #<RUN_COUNTER>): ...`
- MUST be a SINGLE LINE
- MUST NOT write any `EVIDENCE (RUN #...)` line
- MUST NOT write any PASS/FAIL/RESULT/validated= conclusions
- The single BUNDLE line MUST include ONLY:
- `CODEX_CMD=codex exec --full-auto --skip-git-repo-check --model gpt-5.2 -c model_reasoning_effort=medium`
- `SCOPE: <CLI|GUI|MIXED>`
- `VALIDATION_BUNDLE: auto_test_openspec/$ARGUMENTS/run-<RUN4>__task-<TASK_NUM>__ref-<REF>__<YYYYMMDDThhmmssZ>`
- `HOW_TO_RUN: run.sh/run.bat`
- (if SCOPE=GUI or MIXED) `RUNBOOK: tests/gui_runbook_*.md`
- (if SCOPE=GUI or MIXED) `SERVER_URL: <exact url including host+port>`
- forbid Codex from toggling ANY checkbox in tasks.
C) GUI hard rules (only if SCOPE includes GUI/MIXED)
- GUI verification is Supervisor-only via MCP service `playwright-mcp`.
- Worker deliverable for GUI is ONLY the MCP runbook file:
- `tests/gui_runbook_*.md` MUST be MCP-only steps + selectors + assertion points + evidence capture points.
- ABSOLUTELY NO executable browser automation scripts (no Playwright test runner; no Python/Node scripts).
- ABSOLUTELY NO manual browser steps anywhere (no “open Chrome/click …” prose, anywhere in the bundle).
- For GUI/MIXED bundles, `run.sh` / `run.bat` MUST be start-server only:
- MUST start the local server and print URL/port.
- MUST NOT perform state seeding/copying/exporting/testing/validation/probing/installs.
D) Governance boundaries (Worker forbidden; Supervisor-only)
- feature_list governance (MANDATORY; strict):
- The Worker/Codex is FORBIDDEN to edit `openspec/changes/$ARGUMENTS/feature_list.json` (no entry edits, no pass-state edits, no formatting churn).
- If `openspec/changes/$ARGUMENTS/feature_list.json` is missing OR the matching ref entry is missing:
- Under THIS task write:
BLOCKED: Missing feature_list.json (or missing ref entry for <REF_TAG>)
NEEDS: Supervisor/initializer must create/repair feature_list.json (structure + ref mapping). Then re-run this task.
- Then END THIS WORKER RUN immediately (do not proceed with implementation in this run).
- Pass-state updates (e.g., `passes=true/false`) are Supervisor-only and may occur ONLY after Supervisor validation PASS + EVIDENCE is recorded.
- forbid touching any other tasks (no evidence elsewhere; no changes to other items)
- governance boundary (Worker/Codex; mandatory):
- The Worker/Codex is FORBIDDEN to create git commits (no checkpoint commits).
- The Worker/Codex is FORBIDDEN to edit or append `git_openspec_history/<change-id>/runs.log`.
- The Worker/Codex MUST NOT attempt to produce DIFFSTAT/FILES “final” summaries as evidence.
- All commit/runs.log bookkeeping (and DIFFSTAT capture) is Supervisor-only and may occur ONLY after Supervisor validation PASS.
3) After Codex finishes, confirm that `openspec/changes/$ARGUMENTS/tasks.md` has either:
BUNDLE-READY (Worker output, under THIS task):
- EXACTLY ONE `BUNDLE (RUN #<RUN_COUNTER>): ...` line that points to a concrete run-folder:
- includes `VALIDATION_BUNDLE: auto_test_openspec/$ARGUMENTS/run-.../`
- includes `HOW_TO_RUN: run.sh/run.bat`
- if SCOPE includes GUI/MIXED: includes `RUNBOOK: tests/gui_runbook_*.md`
- if SCOPE includes GUI/MIXED: includes `SERVER_URL: ...`
- The referenced run-folder exists and contains at minimum:
- `task.md`, `run.sh`, `run.bat`, `logs/worker_startup.txt`,
- and (when GUI/MIXED) `tests/` with an MCP-only runbook
- For GUI/MIXED, `task.md` MUST include `SERVER_START:` + `SERVER_URL:` + `READY_CHECK:` (as defined above).
- Worker MUST NOT have written any `EVIDENCE (RUN #...)` line.
- Worker MUST NOT have toggled any checkbox.
- Worker MUST NOT have edited feature_list.json.
- Worker MUST NOT have created any git commit.
- Worker MUST NOT have edited `git_openspec_history/<change-id>/runs.log`.
OR BLOCKED (Worker output, under THIS task):
- `BLOCKED: ...` (1–5 line error excerpt)
- `NEEDS: ...` (next concrete unblock step)
OR ROLE_VIOLATION (Worker output, under THIS task):
- Any `EVIDENCE (RUN #...)` / PASS/FAIL/RESULT/validated= conclusion, checkbox toggle, feature_list.json edit, git commit,
or any edit/append to `git_openspec_history/<change-id>/runs.log`.
Otherwise treat as NO_PROGRESS (missing BUNDLE line and/or missing run-folder).
1.4) Supervisor verification after subagent returns
- Re-read TASKS_FILE.
- Determine status (under THIS task only):
- READY_TO_VALIDATE if a compliant BUNDLE (RUN #<RUN_COUNTER>) line exists and the referenced run-folder is present and well-formed.
- BLOCKED if BLOCKED+NEEDS exists.
- ROLE_VIOLATION if Worker wrote any EVIDENCE/PASS/FAIL/RESULT/validated= conclusion, toggled checkboxes, edited feature_list.json, created commits,
or edited/appended `git_openspec_history/<change-id>/runs.log`.
- NO_PROGRESS otherwise.
- If READY_TO_VALIDATE:
- Supervisor MUST execute validation.
- CLI: via `run.sh`/`run.bat` as specified in the bundle.
- GUI/MIXED:
1) MUST start the server first by running `run.sh`/`run.bat` (start-server only).
2) MUST navigate using the `SERVER_URL` provided in the BUNDLE line / task.md.
3) Then execute the MCP `playwright-mcp` runbook.
4) If the server cannot be started or `SERVER_URL` is missing/invalid, treat as bundle not ready for validation (NO_PROGRESS or BLOCKED with NEEDS), not as a feature FAIL.
- Supervisor writes the single EVIDENCE (RUN #<RUN_COUNTER>) line (PASS/FAIL + evidence pointers).
- Supervisor updates feature_list.json pass-state ONLY after PASS.
- Supervisor creates ONE checkpoint commit ONLY after PASS.
- Supervisor appends runs.log ONLY after PASS.
- Supervisor may then toggle the checkbox to - [x] ONLY after PASS.
- DONE is reachable only after Supervisor validation PASS + compliant EVIDENCE exists under THIS task.
If DONE:
- Toggle the checkbox to `- [x]` (Supervisor only).
- Append a FULL RUN ENTRY to PROGRESS_FILE (Supervisor only; verified facts only) including:
- RUN SUMMARY (timestamp, run #, change-id, task/ref, status)
- Evidence pointers (tasks.md evidence line pointer + feature_list passes change + GIT_BASE/GIT_COMMIT/COMMIT_MSG)
- Validation commands/steps + 3–15 lines output excerpt (from Supervisor validation output and/or bundle logs)
- Changes verified: FILES/DIFFSTAT + key edits summary
- [DIALOGUE + TOOL TRACE] with bracket markers, including:
- [Supervisor → Subagent] instruction
- [Tool Use] <task - spawn subagent>
- [Tool Use] <bash - CODEX_CMD "..."> (from subagent trace)
- [Subagent] reported outputs + the exact BUNDLE line + bundle folder pointer(s)
- [Supervisor] the exact EVIDENCE line + acceptance decision + rationale
- Print RUN banner (END) as before.
- RUN_COUNTER += 1 and continue/stop per your session policy.
If BLOCKED:
- Ensure actionable NEEDS exists (next concrete unblock step).
- Call skill `openspec-unblock-research` (Supervisor-only). Do NOT call MCP tools directly here.
- Provide the skill the BLOCKED context (task line + ref, error excerpt, NEEDS, what was tried, env/versions if known).
- Instruct the skill to write its portable research capsule into BOTH bookkeeping artifacts:
(a) Under THIS task in tasks.md:
Add `UNBLOCK GUIDANCE (RUN #<RUN_COUNTER>):` containing:
- Query terms
- Key conclusions
- Evidence pointers (source links/locators)
- Executable next steps + how to verify
(b) Into progress.txt (inside the current RUN entry):
Append a short “Unblock Research Capsule” containing:
- Query terms
- Key conclusions
- Evidence pointers
- Pointer back to the tasks.md UNBLOCK GUIDANCE location
- Append a FULL RUN ENTRY to PROGRESS_FILE capturing blocker + the skill’s capsule + retry decision (verified facts only).
- Retry once as before; if blocked again, STOP and require user/initializer intervention.
If NO_PROGRESS:
- Treat as a FAILED ATTEMPT (not an immediate session stop by default).
- Under THIS task, append/refresh a single diagnostic note:
`BLOCKED: Missing a compliant BUNDLE pointer and/or the referenced validation bundle folder is missing/incomplete for this RUN (workflow non-compliance).`
`NEEDS: Re-run SAME task; Worker/Codex must (1) create a fresh run-folder under auto_test_openspec/<change-id>/... containing task.md + run.sh + run.bat + logs/worker_startup.txt (+ tests/runbook if GUI), and (2) append EXACTLY ONE single-line BUNDLE (RUN #<RUN_COUNTER>) pointer under THIS task (CODEX_CMD + SCOPE + VALIDATION_BUNDLE + HOW_TO_RUN [+ RUNBOOK]).`
- Append a FULL RUN ENTRY to PROGRESS_FILE (status=NO_PROGRESS) including:
- the missing-gate diagnosis,
- the subagent trace,
- Attempt #k and the retry/maxed decision.
- Flow control MUST follow the per-task retry policy:
- If Attempt #k < MAX_ATTEMPTS: continue the retry loop for the SAME task (fresh subagent).
- Else (Attempt #k == MAX_ATTEMPTS): mark the task MAXED and apply dependency-blocking stop logic (stop only if it blocks safe forward progress).
2) Completion (only at start-of-session, or if CURRENT_TASK selection finds none)
- If no unchecked tasks remain:
`[MONITOR] DONE | change=$ARGUMENTS | all tasks checked`
then STOP.流程
初始配置
首先是安装Claude code和codex,这个就不列举了。安装openspec这里要说一下,最好是0.19.0版本,因为再新的版本,openspec的工作流重构了,支持自然语言调用,使用的是skills触发3,后续我也会尝试适配更新最新版本的openspec。
npm install -g @fission-ai/openspec@0.19.0先使用openspec初始化一下项目
openspec init下一步操作 - 将这些提示复制到codex:
────────────────────────────────────────────────────────────
1. 填充项目上下文:
请阅读 openspec/project.md 并协助我完成内容填写
包含我的项目详情、技术栈及规范"
2. 创建您的首个变更提案:
我想添加[在此处填写您的功能]。请创建一个
OpenSpec 对此功能的变更提案
3. 学习 OpenSpec 工作流:
请解释来自 openspec/AGENTS.md 的 OpenSpec 工作流。
以及我该如何与你共同推进这个项目重复流程
先打开codex,使用自然语言提出一个变更提案,例如:为我这个项目添加一个支持夜间模型自动切换的功能
然后再使用skills$openspec-change-interviewer <id>让模型通过采访的方式,明确我们的需求,对齐需求。
再 $openspec-feature-list <id> 让模型列出来一个feature_list.json。
最后打开Claude code,输入/monitor-openspec-codex <id>即可
实际使用流程
安装
openspec(我建议锁0.19.0)我这里强烈建议用
0.19.0,因为更高版本工作流有重构,虽然也支持自然语言调用,但走的是skills,我后续也会尝试适配到最新版3。npm install -g @fission-ai/openspec@0.19.0初始化项目
openspec init初始化完成后,它会提示下一步要做什么。我们可以先把项目上下文补齐,再创建第一个变更提案。
用
codex提一个 change(自然语言就行)比如:
为我这个项目添加一个支持夜间模型自动切换的功能用 skill 把需求“采访清楚”
对齐需求这一步真的很值。我们让模型先问清楚,再开干,后面返工会少很多。
- 运行
openspec-change-interviewer:$openspec-change-interviewer <id> <id>就是openspec文件夹下当前提案的文件夹名
- 运行
生成
feature_list.json- 跑:
$openspec-feature-list <id> - 这一步做完,我们后面就能用它来防止“看似完成、实际没过”的情况
- 跑:
开始监督执行:交给
Claude Code最后打开
Claude Code,输入:/monitor-openspec-codex <id>