feat(video-pipeline): 重构视频流水线，优化成片时间线规则和状态管理

- 引入 manifest.json 作为唯一状态源，所有子 Agent 操作回写 manifest - 重构 timebuilder 逻辑，支持四种视频适配策略（加速/裁剪/放缓/画面停顿） - 统一 TTS 阶段输出结构，单句和多句均写入 segments[] - 重写字幕和配音生成，基于 segments 精确时长实现音画同步 - 新增 confirm 命令支持按 id 范围确认，上传阶段分离图片和视频 - 添加中间产物写入 output/ 目录的约束，清理废弃配置参数
2026-05-02 00:14:40 +08:00
parent b4b92854db
commit 0998fd6ae1
14 changed files with 457 additions and 205 deletions
--- a/.claude/skills/video-from-script/SKILL.md
+++ b/.claude/skills/video-from-script/SKILL.md
@@ -35,14 +35,15 @@ B 模式又分两种：**单图模式**（1 图 → 1 段视频）/ **首尾帧
 ### 核心约束

 1. **不可跳步**：
-   - A（幻灯片）：分镜 → 图片提示词 → 生图 → TTS+成片。无视频阶段
-   - B（AI视频）：分镜 → 图片提示词 → 生图 → 视频提示词 → 生视频 → TTS+成片
+   - A（幻灯片）：分镜 → manifest init → 图片提示词 → 生图 → TTS+成片。无视频阶段
+   - B（AI视频）：分镜 → manifest init → 图片提示词 → 生图 → 视频提示词 → 生视频 → TTS+成片
   - 阶段之间必须审查
-2. **manifest.json 是唯一状态源**：任何操作完成后立即回写
+2. **manifest.json 是唯一状态源**：`pipeline.js init` 在分镜确认后立即执行，创建 `output/{name}/` 目录和初始 manifest。后续所有子 Agent 输出回写此 manifest，不再传裸 JSON
 3. **禁止 curl 调 API**：生图/生视频必须通过 `pipeline.js` 或对应 generator 脚本
 4. **并行优先**：独立子任务用子 Agent 并行
 5. **分镜表是脊骨契约**：用户确认分镜表后，下游子 Agent 只能加字段，禁止改 shot 数量/顺序/字段值。主 Agent 每次接收子 Agent 输出，第一件事数数量是否对得上
 6. **prompts/*.md 只被子 Agent 读**：主 Agent 读 account.json，不读子 Agent 提示词模板
+7. **中间产物落 output**：所有中间文件（items JSON、urls 缓存、子 Agent 输出）必须写入 `output/{name}/` 目录，禁止散落在项目根目录

 ### Step -1: 意图确认（逐项确认，缺一不可）

@@ -79,26 +80,27 @@ B 模式又分两种：**单图模式**（1 图 → 1 段视频）/ **首尾帧

 → 展示给用户确认。确认后**分镜表锁定为脊骨契约**，下游禁止增减 shot。

+### Step 2-0: Manifest 初始化
+
+```bash
+node scripts/pipeline.js init --account <id> --mode <single|framePair> \
+  --items '[{"id":1,"shotDesc":"...","script":"...","duration":5,"directorRef":"tarantino","keyword":"权力"}]'
+```
+
+- 分镜确认后立即执行，创建 `output/{name}/` 目录和初始 `manifest.json`
+- 脚本从 account.json 继承：imageModel、videoModel、format、references
+- `imagePrompt` 暂为空，Step 2-A 补充；`videoPrompt` 暂为空，Step 3-A 补充
+- 输出路径打印到控制台，后续所有操作以此为工作目录
+
 ### Step 2-A: 图片提示词（子 Agent 执行）

- 主 Agent 传**完整分镜表 JSON**（不传原始文案）+ 图片提示词模板路径给子 Agent
- 子 Agent 为每个 shot 追加 `imagePrompt` 字段：
-  - 入参（来自分镜表）：shotDesc + script + directorRef + keyword
-  - 出参：分镜表 JSON + imagePrompt
+- 主 Agent 传**manifest 路径 + 图片提示词模板路径**给子 Agent
+- 子 Agent 读 manifest.items，为每个 shot 追加 `imagePrompt` 字段后回写 manifest
 - **硬约束：输出 shot 数量 == 输入 shot 数量**

 **主 Agent 审查**：① 数量对得上？② shotDesc 内容完整保留？③ 光影策略对应 directorRef？

-### Step 2-B: 生图 + Manifest 初始化
-
-```bash
-node scripts/pipeline.js init --account <id> --mode <single|framePair> \
-  --items '[{"shotDesc":"...","script":"...","duration":5,"imagePrompt":"...","directorRef":"tarantino","keyword":"权力"}]'
-```
-
- items 不含 videoPrompt，后续 Step 3-A 补充
- 脚本从 account.json 继承：imageModel、videoModel、format、references
- 首尾帧模式：每个 item 必须有 `lastFramePrompt`
+### Step 2-B: 生图

 ```bash
 node scripts/pipeline.js run --manifest <path> --phase images
@@ -111,12 +113,9 @@ node scripts/pipeline.js run --manifest <path> --phase images

 ### Step 3-A: 视频提示词（B 模式专属，子 Agent 执行）

- 主 Agent 传分镜表 JSON（含已确认分镜图路径）+ 视频提示词模板路径给子 Agent
- 子 Agent 为每个 shot 生成 `videoPrompt`：
-  - 入参：shotDesc + directorRef + 已确认分镜图 + 目标模型
-  - 出参：videoPrompt（描述镜头运动，非画面内容）
+- 主 Agent 传**manifest 路径 + 视频提示词模板路径**给子 Agent
+- 子 Agent 读 manifest.items（含已确认分镜图路径），为每个 shot 生成 `videoPrompt` 后回写 manifest
 - **硬约束：输出数量 == 分镜表 shot 数量**
- Agent 按 id 对齐回写 manifest.json

 **主 Agent 审查**：① 数量对得上？② 描述运动而非内容？③ 字数 ≤ 50？