feat(video-from-script): 重构工作流为子Agent分步执行并新增提示词模板系统

将视频制作工作流拆分为独立子步骤：分镜 → 图片提示词 → 生图 → 视频提示词 → 生视频 → 成片，每步由子Agent独立执行。引入prompts/目录统一管理提示词模板（分镜.md、图片提示词.md、视频提示词.md），通过account.json的storyboardPrompt/imageStylePrompt/videoStylePrompt字段引用。变更内容： - 新增confirmed机制和pipeline.js confirm命令，生图后必须人工确认才能继续 - manifest schema改用shotDesc/narration/duration/directorRef替代旧字段 - 文件命名规则从keyword改为slug（从shotDesc/narration派生） - 删除旧的storyboard-rules.md和prompt-rules.md - pipeline.js脚本拆分为lib/目录下的独立模块（cmd-init/cmd-confirm/cmd-validate/phase-*） - 新增cmd-create-account支持一键创建带prompts目录的账号 - capcut_assemble支持narration字段替代text作为字幕源 - 新增.gitclaude/settings.json权限配置
2026-04-30 21:18:31 +08:00
parent 7f955647fe
commit 86b9b7948d
32 changed files with 2826 additions and 1292 deletions
--- a/.claude/skills/video-from-script/SKILL.md
+++ b/.claude/skills/video-from-script/SKILL.md
@@ -7,12 +7,13 @@ description: 素材生产路由。根据用户意图分发到对应子技能：i

 ## 强制规则

-1. **工作流不可跳步**：分镜（纯叙事）→ Prompt 生成（分镜+风格）→ Pipeline 执行。每阶段之间必须审查结果
+1. **工作流不可跳步**：分镜 → 图片提示词 → 生图 → 视频提示词 → 生视频 → TTS+成片。每阶段之间必须审查结果
 2. **manifest.json 是唯一状态源**：任何操作（生图、上传、替换素材）完成后必须立即回写 manifest
 3. **禁止 curl 调用生图/生视频 API**：必须通过 `pipeline.js` 或对应 generator 脚本执行
 4. **并行优先**：多个独立子任务必须用子 agent 并行，不要在主对话中串行完成
+5. **prompts/*.md 只被子 Agent 读取**：主 Agent 读 account.json + styles/*.md 获取风格信息，不读子 Agent 提示词模板

-**禁止**：跳过分镜 / 分镜阶段读风格 / 不更新 manifest 就继续 / 一口气跑完 pipeline 不审查
+**禁止**：跳过分镜 / 不更新 manifest 就继续 / 一口气跑完 pipeline 不审查 / 主 Agent 替代子 Agent 生成提示词

 ---

@@ -56,7 +57,7 @@ Agent 创建 manifest.json 后，用 `pipeline.js` 分阶段执行。**不要一

 | 角色 | 职责 |
 |------|------|
-| **Agent**（你） | 读取 account.json + style.md → **分镜规划** → 从分镜生成 imagePrompt/videoPrompt → 写出 manifest.json → 审查每阶段结果 |
+| **Agent**（你） | 读取 account.json + styles/*.md → **分镜规划** → 图片提示词生成 → 视频提示词生成 → 审查每阶段结果 |
 | **Pipeline** | 机械执行：生图 → 上传 → 生视频 → TTS → 成片。每完成一个 item 写盘，支持断点续跑 |

 ### 执行步骤
@@ -90,65 +91,81 @@ Step -1: 意图确认（进入任何步骤前必须完成，逐项确认，缺
  → 以上 5 项全部确认后，agent 写出完整执行计划，让用户最终确认：

  执行计划示例（根据实际任务调整）：
-  1. 读取 {account} 账号配置（id = 目录名）+ 风格文件（style.md）
-  2. 根据用户文案生成分镜表（N shot）
-  3. 分镜 + 风格 → 生成英文 prompts（imagePrompt + videoPrompt）
+  1. 读取 {account} 账号配置（id = 目录名）+ styles/*.md
+  2. 子 Agent 读取 prompts/分镜.md → 根据用户文案生成分镜表（N shot）
+  3. 子 Agent 读取 prompts/图片提示词.md → 为每个 shot 生成 imagePrompt
  4. pipeline.js init → 创建 manifest.json + 输出目录
-  5. pipeline.js run --phase images → 生图 → 人工审查
-  6. pipeline.js run --phase upload,videos → 上传 + 生成视频
-  7. pipeline.js run --phase tts,assemble → TTS + 成片
+  5. pipeline.js run --phase images → 生图 → 人工审查确认（可选）
+  6. 子 Agent 读取 prompts/视频提示词.md → 为每个 shot 生成 videoPrompt
+  7. pipeline.js run --phase upload,videos → 上传 + 生成视频
+  8. pipeline.js run --phase tts,assemble → TTS + 成片

  用户确认 "开始" → 进入 Step 0
  用户修改 → 调整计划后重新输出
  → 禁止在用户未确认执行计划的情况下进入 Step 0

-Step 0: 前置检查（账号+风格校验）
-  - 读取 根目录 accounts/{account}/account.json，检查 styles 字段是否配置了风格文件
-  - 如果账号不存在或没有风格：
+Step 0: 前置检查（账号+风格+提示词模板校验）
+  - 读取 根目录 accounts/{account}/account.json
+  - 检查 prompts/ 目录下的提示词模板是否存在（分镜.md、图片提示词.md、视频提示词.md）
+  - 检查 styles/ 目录下是否有风格文件
+  - 如果账号不存在或缺少模板/风格：
    → 暂停流程，通过 CLI 创建：`pipeline.js create-account --id <id> --name <名称> --references ./ref.png`
-    → 然后编辑 `styles/*.md` 完善提示词策略
+    → 然后编辑 prompts/*.md 和 styles/*.md
  - 校验账号完整性：`pipeline.js validate-account --account <id>`
-  - 有风格则继续 Step 1
+  - 全部就绪则继续 Step 1

-Step 1: 分镜规划（子 Agent 执行）
-  - 主 Agent 将用户文案 + 约束交给子 Agent
-  - 子 Agent 读取 references/storyboard-rules.md，按要求输出分镜表
-  - 主 Agent 审查分镜表（景别交替、hook 设置、时长合理）
-  - 展示给用户确认，确认后进入 Step 2
+Step 1: 分镜脚本生成（子 Agent 执行）
+  - 读取 account.json 中的 storyboardPrompt 字段，定位分镜模板文件（如 prompts/分镜.md）
+  - 主 Agent 将用户文案 + 模板交给子 Agent
+  - 子 Agent 按模板要求输出分镜表 JSON：
+    ```json
+    [{"id":1,"shotDesc":"英文画面描述，40-80词","narration":"中文口播旁白，≤22字","duration":5,"directorRef":"tarantino"}]
+    ```
+  - 主 Agent 审查分镜表（时长合理、隐性动势完整、directorRef 已填）
+  - 展示给用户确认，确认后进入 Step 2-A

-Step 2: Prompt 生成 + Manifest 初始化（分镜 + 风格 → 英文 prompts → pipeline.js init）
-  - 输入：分镜表 + style.md + account.json
-  - 子 Agent 将每个 shot 的中文画面描述结合风格文件，生成：
-    · imagePrompt（英文画面描述，给 Gemini/MJ）
-    · videoPrompt（英文运动描述，给 Grok/VEO/Kling）
-    · keyword, keywordColor
-  - **禁止 AI 手写 manifest.json**，必须通过脚本初始化：
+Step 2-A: 生成图片提示词（子 Agent 执行）
+  - 读取 account.json 中的 imageStylePrompt 字段，定位图片提示词模板（如 prompts/图片提示词.md）
+  - 子 Agent 为每个 shot 生成 imagePrompt：
+    - 入参：shotDesc + narration（情绪参考）+ directorRef（光影策略）+ 目标模型
+    - 出参：imagePrompt（可直接送给图片模型的英文提示词）
+  - 主 Agent 审查 imagePrompt 质量（shotDesc 内容完整保留、光影词库对应 directorRef）
+
+Step 2-B: 生成静态分镜图 + Manifest 初始化
+  - 组装 items 并初始化 manifest（**不含 videoPrompt**）：
    ```bash
    node pipeline.js init --account <id> --mode <single|framePair> \
-      --items '[{"text":"文案","imagePrompt":"...","videoPrompt":"...","keyword":"关键词","keywordColor":"#FF6B35"}]'
+      --items '[{"shotDesc":"...","narration":"...","duration":5,"imagePrompt":"...","directorRef":"tarantino"}]'
    ```
  - 脚本自动从 account.json 继承：imageModel、videoModel、format、references
-  - 脚本自动创建目录、校验必填字段、设置 status=pending
-  - AI 只负责创意内容（text、imagePrompt、videoPrompt、keyword），不碰结构字段
-  - 首尾帧模式额外要求：每个 item 必须有 `lastFramePrompt`（`imagePrompt` 作为第一帧，不需要单独的 `firstFramePrompt`）
-  - init 返回 manifest 路径，后续命令使用该路径
+  - 所有 item.confirmed = false
+  - 生成分镜图：`pipeline.js run --manifest <path> --phase images`
+    - 参考图在此阶段介入（Gemini 图生图 / MJ --sref）
+  - 首尾帧模式额外要求：每个 item 必须有 `lastFramePrompt`

-Step 3: 生图 → 审查
-  跑 images 阶段。完成后：
-  - 用户指定"自行选图"→ Agent 自动检查数量对上文案数量就通过继续
-  - 否则 → 暂停，等用户审查。不合格则删除/调 prompt 重跑，不进入下一步
+Step 2-C: 人工确认（可选卡点）
+  - 展示所有分镜图给用户
+  - 用户可：确认全部 / 替换 MJ 候选图（改 item.file = item.candidates[N]） / 删除不合格 item / 跳过确认直接继续
+  - 用户确认后：`node pipeline.js confirm --manifest <path> --all`
+  - 跳过确认时：批量设置 `confirmed = true`，直接进入 Step 3

-  生图模型
-  - 支持模型：gemini / mj / kling
-  - 降级链：gemini → mj → kling → gemini（循环）
-  - 触发：连续失败→ Agent 换下一个模型重跑失败项
-  - 操作：`pipeline.js run --manifest <path> --phase images --retry-failed --image-model <新模型>`
+Step 3-A: 生成视频提示词（子 Agent 执行）
+  - 读取 account.json 中的 videoStylePrompt 字段，定位视频提示词模板（如 prompts/视频提示词.md）
+  - 子 Agent 为每个 shot 生成 videoPrompt：
+    - 入参：shotDesc + directorRef（运动策略）+ 已确认的分镜图 + 目标模型
+    - 出参：videoPrompt（描述镜头运动的英文提示词）
+  - Agent 将 videoPrompt 回写到 manifest items（直接编辑 manifest.json 的每个 item）
+  - 主 Agent 审查 videoPrompt 质量（描述运动而非内容、字数≤50）

-Step 4: 上传 + 生视频（可选，图文成片跳过此步）
-  跑 upload + videos 阶段。首尾帧模式检查过渡连贯性。
+Step 3-B: 生成视频片段
+  - 上传 + 生成视频：`pipeline.js run --manifest <path> --phase upload,videos`
+  - 跳过确认时由 Step 2-C 自动批量设置 confirmed=true
+  - 首尾帧模式检查过渡连贯性

-Step 5: TTS + 成片
-  跑 tts + assemble 阶段。检查字幕准确、BGM 不盖配音。
+Step 4: TTS + 成片
+  - 跑 tts + assemble 阶段：`pipeline.js run --manifest <path> --phase tts,assemble`
+  - TTS 使用 narration 字段（口播旁白）
+  - 检查字幕准确、BGM 不盖配音
 ```

 > 命令语法见下方「CLI 参考」，不在此处重复。
@@ -163,15 +180,18 @@ node pipeline.js create-account --id <id> --name <名称> \
 # 校验账号完整性
 node pipeline.js validate-account --account <id>

-# 初始化 manifest（Step 2 使用，AI 只提供创意内容）
+# 初始化 manifest（Step 2-B 使用，AI 只提供创意内容，不含 videoPrompt）
 node pipeline.js init --account <id> --mode <single|framePair> \
-  --items '[{"text":"...","imagePrompt":"...","videoPrompt":"...","keyword":"...","keywordColor":"..."}]'
+  --items '[{"shotDesc":"...","narration":"...","duration":5,"imagePrompt":"...","directorRef":"tarantino"}]'
 # 也可从文件读取 items（适合大量数据）
 node pipeline.js init --account <id> --mode single --items-file ./items.json

 # 校验 manifest 完整性
 node pipeline.js validate --manifest <path>

+# 人工确认分镜图（Step 2-C，可选：跳过时 Agent 批量设置 confirmed=true）
+node pipeline.js confirm --manifest <path> --all
+
 # 跑指定阶段
 node pipeline.js run --manifest <path> --phase images
 node pipeline.js run --manifest <path> --phase upload,videos
@@ -259,72 +279,9 @@ digraph frame_pair {

 ---

-## 多阶段执行策略
+## 视频模型与执行策略

-用 Agent 工具串行或并行执行子技能，**阶段间必须通过质量卡点**：
-
-**生图+成片（串行+人工卡点）**：
-```dot
-digraph image_then_assemble {
-  rankdir=LR
-  node [shape=box, style=filled, fillcolor="#f5f5f5", fontsize=11]
-
-  agent1 [label="Agent 1\nimage-generator\n生成图片到 output/"]
-  gate1 [label="人工卡点\n用户挑选图片\n删除不合格的", shape=diamond, fillcolor="#fff9c4"]
-  agent2 [label="Agent 2\ncapcut\n读取精选素材 → 组装"]
-
-  agent1 -> gate1 -> agent2
-}
-```
-
-**配音+生图（并行+自动校验）**：
-```dot
-digraph parallel_image_tts {
-  rankdir=LR
-  node [shape=box, style=filled, fillcolor="#f5f5f5", fontsize=11]
-
-  agent1 [label="Agent 1\nimage-generator\n生图", fillcolor="#e8f5e9"]
-  agent2 [label="Agent 2\ncapcut\nTTS 配音", fillcolor="#e8f5e9"]
-  validate [label="自动校验\n分辨率>=1024\n画幅匹配\n音频时长匹配", shape=diamond, fillcolor="#fff9c4"]
-  agent3 [label="Agent 3\ncapcut\n组装全部素材 → 成片"]
-
-  agent1 -> validate
-  agent2 -> validate
-  validate -> agent3
-}
-```
-
-**图生视频 - 单图模式**：
-```dot
-digraph single_image_video {
-  rankdir=LR
-  node [shape=box, style=filled, fillcolor="#f5f5f5", fontsize=11]
-
-  agent1 [label="Agent 1\nimage-generator\n生图 + videoPrompt"]
-  gate1 [label="人工卡点\n用户挑选图片", shape=diamond, fillcolor="#fff9c4"]
-  agent2 [label="Agent 2\nGrok / VEO / Kling\n单图输入，并行生成视频"]
-  agent3 [label="Agent 3\ncapcut\n视频片段 + 字幕 → 成片"]
-
-  agent1 -> gate1 -> agent2 -> agent3
-}
-```
-
-**图生视频 - 首尾帧模式**：
-```dot
-digraph frame_pair_video {
-  rankdir=LR
-  node [shape=box, style=filled, fillcolor="#f5f5f5", fontsize=11]
-
-  agent1 [label="Agent 1\nimage-generator\n成对生图\n(firstFrame + lastFrame)\n可并行"]
-  gate1 [label="人工卡点\n检查首尾帧连贯性\n同一场景/相似视角", shape=diamond, fillcolor="#fff9c4"]
-  agent2 [label="Agent 2\nVEO / Kling\n双图输入\nimages:[first, last]"]
-  agent3 [label="Agent 3\ncapcut\n视频片段 + 字幕 → 成片"]
-
-  agent1 -> gate1 -> agent2 -> agent3
-}
-```
-
-**视频模型选择**：
+### 视频模型选择

 | 模型 | 时长 | 画幅 | 单图 | 首尾帧 | 特点 | API |
 |------|------|------|------|--------|------|-----|
@@ -333,11 +290,11 @@ digraph frame_pair_video {
 | Veo3-fast-frames | ~8s | 16:9, 9:16 | ✅ | ✅ | 多帧、质量最高 | jimmyai.cn |
 | Kling | 6s | 任意 | ✅ | ✅ | 快、首尾帧支持 | yunwu.ai |

-图生视频注意事项：
+### 视频生成注意事项
+
 - **并行执行**：先同时提交所有任务（并发 3），再并行轮询结果
 - 单个视频生成耗时 60-300 秒
 - 脚本内置 3 次重试，每次自动简化提示词
- **videoPrompt 在生图阶段一并生成**
 - VEO 独有：`enhance_prompt=true` 中文增强，`enable_upsample=true` 超分
 - 配置在 `config.json`

@@ -377,8 +334,8 @@ node kling-video-generator.js --image <url> --prompt <prompt> -o ./videos
 output/{name}_{YYYYMMDD}_{NNN}/
 ├── manifest.json                # 主清单（贯穿全流程）
 ├── prompts.txt                  # 原始提示词存档
-├── images/                      # scene_{NN}_{keyword}.jpeg（首尾帧加 _last 后缀）
-├── videos/                      # scene_{NN}_{keyword}.mp4（与图片对应）
+├── images/                      # scene_{NN}_{slug}.jpeg（slug 从 narration/shotDesc 派生，首尾帧加 _last 后缀）
+├── videos/                      # scene_{NN}_{slug}.mp4（与图片对应）
 └── urls.json                    # OSS 公网 URL 映射
 ```

@@ -397,16 +354,6 @@ output/{name}_{YYYYMMDD}_{NNN}/

 ---

-## 分镜规划规则
-
-完整规则见 [storyboard-rules.md](references/storyboard-rules.md)。由子 Agent 读取并执行，主 Agent 只审查输出。
-
---
-
-## 提示词生成规则
-
-完整规则见 [prompt-rules.md](references/prompt-rules.md)。由子 Agent 读取并执行，主 Agent 审核提示词质量，不合格则退回重做。
-
 ---

 ## 质量卡点（Agent 可执行）