feat(video-pipeline): 重构视频流水线，优化成片时间线规则和状态管理

- 引入 manifest.json 作为唯一状态源，所有子 Agent 操作回写 manifest - 重构 timebuilder 逻辑，支持四种视频适配策略（加速/裁剪/放缓/画面停顿） - 统一 TTS 阶段输出结构，单句和多句均写入 segments[] - 重写字幕和配音生成，基于 segments 精确时长实现音画同步 - 新增 confirm 命令支持按 id 范围确认，上传阶段分离图片和视频 - 添加中间产物写入 output/ 目录的约束，清理废弃配置参数
2026-05-02 00:14:40 +08:00
parent b4b92854db
commit 0998fd6ae1
14 changed files with 457 additions and 205 deletions
--- a/.claude/skills/config.json
+++ b/.claude/skills/config.json
@@ -1,6 +1,6 @@
 {
-  "jianyingDraftPath": "/Users/lc/Movies/JianyingPro/User Data/Projects/com.lveditor.draft",
+  "jianyingDraftPath": "C:/Users/45070/AppData/Local/JianyingPro/User Data/Projects/com.lveditor.draft",
-  "capcutMateDir": "/Users/lc/capcut-mate",
+  "capcutMateDir": "C:/Users/45070/capcut-mate",
  "capcutMateApiBase": "http://capcut.muyetools.cn/openapi/capcut-mate/v1",
  "imgbbApiKey": "deprecated",
  "geminiApiBaseUrl": "https://yunwu.ai",
--- a/.claude/skills/video-from-script/SKILL.md
+++ b/.claude/skills/video-from-script/SKILL.md
@@ -35,14 +35,15 @@ B 模式又分两种：**单图模式**（1 图 → 1 段视频）/ **首尾帧
 ### 核心约束
 1. **不可跳步**：
-   - A（幻灯片）：分镜 → 图片提示词 → 生图 → TTS+成片。无视频阶段
+   - A（幻灯片）：分镜 → manifest init → 图片提示词 → 生图 → TTS+成片。无视频阶段
-   - B（AI视频）：分镜 → 图片提示词 → 生图 → 视频提示词 → 生视频 → TTS+成片
+   - B（AI视频）：分镜 → manifest init → 图片提示词 → 生图 → 视频提示词 → 生视频 → TTS+成片
   - 阶段之间必须审查
-2. **manifest.json 是唯一状态源**：任何操作完成后立即回写
+2. **manifest.json 是唯一状态源**：`pipeline.js init` 在分镜确认后立即执行，创建 `output/{name}/` 目录和初始 manifest。后续所有子 Agent 输出回写此 manifest，不再传裸 JSON
 3. **禁止 curl 调 API**：生图/生视频必须通过 `pipeline.js` 或对应 generator 脚本
 4. **并行优先**：独立子任务用子 Agent 并行
 5. **分镜表是脊骨契约**：用户确认分镜表后，下游子 Agent 只能加字段，禁止改 shot 数量/顺序/字段值。主 Agent 每次接收子 Agent 输出，第一件事数数量是否对得上
 6. **prompts/*.md 只被子 Agent 读**：主 Agent 读 account.json，不读子 Agent 提示词模板
 7. **中间产物落 output**：所有中间文件（items JSON、urls 缓存、子 Agent 输出）必须写入 `output/{name}/` 目录，禁止散落在项目根目录
 ### Step -1: 意图确认（逐项确认，缺一不可）
@@ -79,26 +80,27 @@ B 模式又分两种：**单图模式**（1 图 → 1 段视频）/ **首尾帧
 → 展示给用户确认。确认后**分镜表锁定为脊骨契约**，下游禁止增减 shot。
 ### Step 2-0: Manifest 初始化
 ```bash
 node scripts/pipeline.js init --account <id> --mode <single|framePair> \
  --items '[{"id":1,"shotDesc":"...","script":"...","duration":5,"directorRef":"tarantino","keyword":"权力"}]'
 ```
 - 分镜确认后立即执行，创建 `output/{name}/` 目录和初始 `manifest.json`
 - 脚本从 account.json 继承：imageModel、videoModel、format、references
 - `imagePrompt` 暂为空，Step 2-A 补充；`videoPrompt` 暂为空，Step 3-A 补充
 - 输出路径打印到控制台，后续所有操作以此为工作目录
 ### Step 2-A: 图片提示词（子 Agent 执行）
- 主 Agent 传**完整分镜表 JSON**（不传原始文案）+ 图片提示词模板路径给子 Agent
+- 主 Agent 传**manifest 路径 + 图片提示词模板路径**给子 Agent
- 子 Agent 为每个 shot 追加 `imagePrompt` 字段：
+- 子 Agent 读 manifest.items，为每个 shot 追加 `imagePrompt` 字段后回写 manifest
  - 入参（来自分镜表）：shotDesc + script + directorRef + keyword
  - 出参：分镜表 JSON + imagePrompt
 - **硬约束：输出 shot 数量 == 输入 shot 数量**
 **主 Agent 审查**：① 数量对得上？② shotDesc 内容完整保留？③ 光影策略对应 directorRef？
-### Step 2-B: 生图 + Manifest 初始化
+### Step 2-B: 生图
 ```bash
 node scripts/pipeline.js init --account <id> --mode <single|framePair> \
  --items '[{"shotDesc":"...","script":"...","duration":5,"imagePrompt":"...","directorRef":"tarantino","keyword":"权力"}]'
 ```
 - items 不含 videoPrompt，后续 Step 3-A 补充
 - 脚本从 account.json 继承：imageModel、videoModel、format、references
 - 首尾帧模式：每个 item 必须有 `lastFramePrompt`
 ```bash
 node scripts/pipeline.js run --manifest <path> --phase images
@@ -111,12 +113,9 @@ node scripts/pipeline.js run --manifest <path> --phase images
 ### Step 3-A: 视频提示词（B 模式专属，子 Agent 执行）
- 主 Agent 传分镜表 JSON（含已确认分镜图路径）+ 视频提示词模板路径给子 Agent
+- 主 Agent 传**manifest 路径 + 视频提示词模板路径**给子 Agent
- 子 Agent 为每个 shot 生成 `videoPrompt`：
+- 子 Agent 读 manifest.items（含已确认分镜图路径），为每个 shot 生成 `videoPrompt` 后回写 manifest
  - 入参：shotDesc + directorRef + 已确认分镜图 + 目标模型
  - 出参：videoPrompt（描述镜头运动，非画面内容）
 - **硬约束：输出数量 == 分镜表 shot 数量**
 - Agent 按 id 对齐回写 manifest.json
 **主 Agent 审查**：① 数量对得上？② 描述运动而非内容？③ 字数 ≤ 50？
--- a/.claude/skills/video-from-script/references/manifest-schema.md
+++ b/.claude/skills/video-from-script/references/manifest-schema.md
@@ -9,9 +9,9 @@
 ## 创建方式
 ```bash
-# Step 2-A 生成 imagePrompt 后，通过脚本初始化（不含 videoPrompt）
+# Step 2-0：分镜确认后立即初始化（imagePrompt/videoPrompt 后续补充）
 node scripts/pipeline.js init --account 军事账号 --mode single \
-  --items '[{"shotDesc":"英文画面描述","script":"中文口播文案","duration":5,"imagePrompt":"English prompt","directorRef":"tarantino","keyword":"权力"}]'
+  --items '[{"shotDesc":"英文画面描述","script":"中文口播文案","duration":5,"directorRef":"tarantino","keyword":"权力"}]'
 # 或从文件读取
 node scripts/pipeline.js init --account 军事账号 --mode single --items-file ./items.json
@@ -193,7 +193,7 @@ node scripts/pipeline.js run --manifest <path> --retry-failed
 ## 目录结构
 ```
-output/{account}_{YYYYMMDD}_{NNN}/
+output/{name}_{YYYYMMDD}_{NNN}/
 ├── manifest.json       # 主清单
 ├── images/             # scene_{NN}_{slug}.jpeg（首尾帧加 _last，MJ 候选加 _cand{1-4}）
 ├── videos/             # scene_{NN}_{slug}.mp4
@@ -206,7 +206,7 @@ slug 从 `shotDesc` 派生（slugify: 保留中文和字母数字，最多 20
 ## segments[] 字段（TTS 分句）
-TTS 阶段自动生成。仅当 `script` 被切分为 2 句及以上时才写入。单句时不写 segments。
+TTS 阶段统一生成，单句时数组仅 1 个元素，多句时 N 个元素。assemble 阶段直接使用各 segment 的实际音频时长对齐字幕。
 | 字段 | 说明 |
 |------|------|
@@ -214,4 +214,26 @@ TTS 阶段自动生成。仅当 `script` 被切分为 2 句及以上时才写入
 | `audio` | 该句音频路径（相对 manifest） |
 | `duration` | 该句音频时长（秒） |
-`item.audio` 指向所有分段合并后的完整音频，`item.audioDuration` 为各段累计时长。assemble 阶段优先用 `segments` 的精确时长对齐字幕，无 segments 时回退到字数权重估算。
+`item.audio` 指向 `segments[0].audio`，`item.audioDuration` 为各段累计时长。assemble 阶段遍历 segments 逐一添加音频和字幕，使用实际文件时长（非比例分配），确保音频与字幕精确同步，消除留白。
 ---
 ## 成片时间线规则
 ### 图片模式（images）
 图片没有独立时长。TTS 音频时长 = 画面时长。无 TTS 音频的 item 时长为 0（跳过，不显示）。
 ### 视频模式（videos）
 TTS 音频为主轴，视频通过以下策略适配音频时长：
 | ratio = videoDur/audioDur | 策略 | 说明 |
 |---------------------------|------|------|
 | 0.9 ~ 1.1 | none | 接近匹配，无需调整 |
 | > 1.1, ≤ 2 | speed_up | 加速（setpts 压缩时间） |
 | > 2 | trim | 裁剪（截断到音频时长） |
 | < 0.9, ≥ 0.5 | slow_down | 放缓（setpts 拉长时间） |
 | < 0.5 | freeze | 画面停顿（视频原速 + 最后一帧冻结补时长） |
 所有策略失败后兜底：截断到目标时长。
--- a/.claude/skills/video-from-script/scripts/capcut_assemble.js
+++ b/.claude/skills/video-from-script/scripts/capcut_assemble.js
@@ -215,28 +215,89 @@ function getAudioDurationSec(filePath) {
 // 主流程
 // ============================================================================
-function buildTimeline(items, defaultDurationUs) {
+function buildTimeline(items) {
-  // 音频为主轴，视频调速适配（≤2x 加速，>2x 截断）
+  // 核心规则：
  //   图片模式：图片没有独立时长，TTS 音频时长 = 画面时长。无音频 = 0 时长（跳过）
  //   视频模式：TTS 为主轴，视频通过 裁剪/加速/放缓/停顿 适配
  //     视频比音频长（ratio > 1.1）:
  //       ≤ 2x → 加速（setpts 压缩时间）
  //       > 2x → 裁剪（截断到音频时长）
  //     视频比音频短（ratio < 0.9）:
  //       ≥ 0.5x → 放缓（setpts 拉长时间，≤2x慢速）
  //       < 0.5x → 画面停顿（视频正常播放+最后一帧冻结补时长）
  let offset = 0
  return items.map(item => {
-    const audioDur = (item.audioDuration != null) ? item.audioDuration * US : 0
+    // 有 segments 时用各段实际时长之和（精确对齐音频文件）
    let audioDur
    if (item.segments && item.segments.length > 0) {
      audioDur = item.segments.reduce((sum, s) => sum + (s.duration || 0), 0) * US
    } else {
      audioDur = (item.audioDuration != null) ? item.audioDuration * US : 0
    }
    const videoDur = (item.videoDuration != null) ? item.videoDuration * US : 0
-    // 无 TTS：用视频时长或固定时长
+    const hasVideo = !!(item.video || item.videoUrl || item.url)
    // 无 TTS 音频
    if (audioDur <= 0) {
-      const dur = videoDur || defaultDurationUs
+      if (hasVideo && videoDur > 0) {
-      const entry = { start: offset, end: offset + dur, duration: dur, speed: 1 }
+        // 视频模式无音频：用视频原始时长
        const entry = { start: offset, end: offset + videoDur, duration: videoDur, speed: 1, strategy: 'none' }
        offset += videoDur
        return entry
      }
      // 图片模式无音频：0 时长，标记跳过
      const entry = { start: offset, end: offset, duration: 0, speed: 1, strategy: 'none', skip: true }
      return entry
    }
    // 有 TTS：音频时长为主轴
    const dur = audioDur
    if (!hasVideo || videoDur <= 0) {
      // 图片模式：直接用音频时长
      const entry = { start: offset, end: offset + dur, duration: dur, speed: 1, strategy: 'none' }
      offset += dur
      return entry
    }
    // 视频模式：视频 vs 音频时长匹配
    const ratio = videoDur / audioDur
    if (ratio > 1.1) {
      // 视频比音频长
      if (ratio <= 2) {
        // 加速策略
        const entry = { start: offset, end: offset + dur, duration: dur, speed: ratio, strategy: 'speed_up' }
        offset += dur
        return entry
      } else {
        // 裁剪策略
        const entry = { start: offset, end: offset + dur, duration: dur, speed: 1, strategy: 'trim' }
        offset += dur
        return entry
      }
    } else if (ratio < 0.9) {
      // 视频比音频短
      if (ratio >= 0.5) {
        // 放缓策略（慢放 ≤2x）
        const entry = { start: offset, end: offset + dur, duration: dur, speed: ratio, strategy: 'slow_down' }
        offset += dur
        return entry
      } else {
        // 画面停顿策略（视频原速播放 + 最后一帧冻结补时长）
        const entry = {
          start: offset, end: offset + dur, duration: dur, speed: 1,
          strategy: 'freeze', freezeExtra: dur - videoDur,
        }
        offset += dur
        return entry
      }
    } else {
      // 接近匹配（0.9 ~ 1.1），无需调整
      const entry = { start: offset, end: offset + dur, duration: dur, speed: 1, strategy: 'none' }
      offset += dur
      return entry
    }
    // 有 TTS：音频时长为主轴
    const dur = audioDur
    const ratio = videoDur > 0 ? videoDur / audioDur : 1
    // ≤2x: 加速到音频时长；>2x: 截断（视频只取前 audioDur 部分）
    const speed = ratio <= 2 ? ratio : 1
    const needAdjust = videoDur > audioDur + 100000 // 视频比音频长 0.1s 以上才需要调整
    const entry = { start: offset, end: offset + dur, duration: dur, speed, needAdjust }
    offset += dur
    return entry
  })
 }
@@ -253,7 +314,6 @@ async function assemble(args) {
    filter: filterStr,
    format = '9:16',
    apiKey = '',
    duration = '4',
    animation = '轻微放大',
  } = args
@@ -284,22 +344,44 @@ async function assemble(args) {
  }
  const { width, height } = getResolution(format)
  const defaultDurationUs = parseFloat(duration) * US
  // 过滤出实际存在的文件
  const missingFileItems = []
  const items = manifest.items.filter(item => {
    if (item.url) return true // 视频模式可能用 URL
    if (item.video) return true // 视频模式本地文件
    if (!item.file) {
      missingFileItems.push(item.id || '?')
      return false
    }
    const filePath = path.join(inputDir, item.file)
    return fs.existsSync(filePath)
  })
  if (items.length === 0) {
    if (missingFileItems.length > 0) {
      throw new Error(`没有可用的素材文件 — ${missingFileItems.length} 个 item 缺少 file 字段（id: ${missingFileItems.join(', ')}），请先运行 images 阶段`)
    }
    throw new Error('没有可用的素材文件')
  }
  if (items.length === 0) throw new Error('没有可用的素材文件')
  // 用 ffprobe 测量实际音频/视频时长，替代 manifest 中的估计值
  let audioMeasured = 0, videoMeasured = 0
  for (const item of items) {
-    // 测量 TTS 音频实际时长（有 segments 时跳过，audioDuration 已是精确累计值）
+    // 测量各 segment 音频文件实际时长
-    if (item.audio && !item.audio.startsWith('http') && !item.segments) {
+    if (item.segments && item.segments.length > 0) {
      for (const seg of item.segments) {
        if (!seg.audio || seg.audio.startsWith('http')) continue
        const audioPath = path.isAbsolute(seg.audio)
          ? seg.audio
          : path.resolve(inputDir, seg.audio)
        if (!fs.existsSync(audioPath)) continue
        const actualDur = await getAudioDurationSec(audioPath)
        if (actualDur != null) { seg.duration = actualDur; audioMeasured++ }
      }
    } else if (item.audio && !item.audio.startsWith('http')) {
      const audioPath = path.isAbsolute(item.audio)
        ? item.audio
        : path.resolve(inputDir, item.audio)
@@ -323,16 +405,32 @@ async function assemble(args) {
    console.log(`  实际时长测量: 音频 ${audioMeasured} 个, 视频 ${videoMeasured} 个`)
  }
-  const timeline = buildTimeline(items, defaultDurationUs)
+  const timeline = buildTimeline(items)
  const totalDurationUs = timeline.length > 0 ? timeline[timeline.length - 1].end : 0
  const hasTTS = items.some(item => item.audio && item.audioDuration != null)
  // 时间轴诊断
  for (let i = 0; i < items.length; i++) {
    const item = items[i]
    const tl = timeline[i]
    if (tl.skip) { console.log(`  [${i + 1}] 跳过（无音频）`); continue }
    const audioDur = item.segments
      ? item.segments.reduce((s, seg) => s + (seg.duration || 0), 0)
      : (item.audioDuration || 0)
    const slotDur = tl.duration / US
    const diff = slotDur - audioDur
    const videoDur = (item.videoDuration || 0)
    const stratInfo = tl.strategy && tl.strategy !== 'none' ? ` 策略=${tl.strategy}` : ''
    const marker = Math.abs(diff) > 0.05 ? ' ⚠️ 不对齐' : ''
    console.log(`  [${i + 1}] 画面=${slotDur.toFixed(2)}s 音频=${audioDur.toFixed(2)}s 视频=${videoDur.toFixed(2)}s${stratInfo}${marker}`)
  }
  // -- 读取转场策略（在 addImages/addVideos 之前） --
  const transitionConfig = loadTransitions(manifest)
  console.log(`\nCapCut 成片组装`)
  console.log(`  模式: ${mode}  画幅: ${format} (${width}x${height})`)
-  console.log(`  时间线: ${hasTTS ? 'TTS音频驱动' : `固定${duration}s/段`}  总时长: ${(totalDurationUs / US).toFixed(1)}s`)
+  console.log(`  时间线: ${hasTTS ? 'TTS音频驱动' : '视频原始时长'}  总时长: ${(totalDurationUs / US).toFixed(1)}s`)
  console.log(`  字幕: ${subtitles}  配音: ${voiceover}  动画: ${animation}`)
  if (finalEffects) console.log(`  特效: ${finalEffects}`)
  if (finalFilter) console.log(`  滤镜: ${finalFilter}`)
@@ -386,10 +484,10 @@ async function assemble(args) {
    for (let i = 0; i < items.length; i++) {
      const item = items[i]
      const tl = timeline[i]
-      if (tl.needAdjust && item.video) {
+      if (tl.strategy && tl.strategy !== 'none' && item.video) {
        const videoPath = path.resolve(inputDir, item.video)
        const audioDur = tl.duration / US
-        const adjustedPath = await adjustVideoSpeed(videoPath, audioDur)
+        const adjustedPath = await adjustVideoSpeed(videoPath, audioDur, tl.strategy, tl.speed, tl.freezeExtra || 0)
        if (adjustedPath !== videoPath) {
          item.video = path.relative(inputDir, adjustedPath)
          item.videoDuration = audioDur
@@ -398,7 +496,7 @@ async function assemble(args) {
      }
    }
    if (adjustedCount > 0) {
-      console.log(`  视频调速: ${adjustedCount}/${items.length} 个`)
+      console.log(`  视频调整: ${adjustedCount}/${items.length} 个`)
    }
    // Step 2: 上传（已调速的）视频到 OSS
@@ -547,7 +645,7 @@ async function assemble(args) {
  console.log(`  草稿ID: ${draftId}`)
  console.log(`  总时长: ${(totalDurationUs / US).toFixed(1)}s`)
  console.log(`  素材数: ${items.length}`)
-  console.log(`  时间线: ${hasTTS ? 'TTS音频驱动' : '固定时长'}`)
+  console.log(`  时间线: ${hasTTS ? 'TTS音频驱动' : '视频原始时长'}`)
  if (mode === 'videos' && subtitles === 'false') {
    console.log(`\n  >> 视频模式未加字幕，请在剪映中打开草稿 → 识别字幕 → 语音识别生成\n`)
  }
@@ -713,54 +811,142 @@ async function addKenBurns(draftUrl, segmentIds, items, timeline, manifest) {
 // ============================================================================
 /**
- * ffmpeg 调速：将视频调整为指定时长
+ * ffmpeg 视频调整：根据策略适配音频时长
- * ratio <= 2x: 加速；ratio > 2x: 截断
+ *
- * 返回调整后的文件路径（调整失败则返回原路径）
+ * 策略（按 ratio = videoDur / audioDur 选择）:
 *   speed_up  (ratio > 1.1, ≤2x)  → setpts 压缩时间（加速）
 *   trim      (ratio > 2x)        → 截断到目标时长
 *   slow_down (ratio < 0.9, ≥0.5x) → setpts 拉长时间（慢放）
 *   freeze    (ratio < 0.5x)      → 视频原速 + 最后一帧冻结补时长
 *   none      (0.9~1.1)           → 无需调整
 *
 * 所有策略失败后兜底：截断到目标时长
 *
 * 返回调整后的文件路径（失败则返回原路径）
 */
-async function adjustVideoSpeed(videoPath, targetDurationSec) {
+async function adjustVideoSpeed(videoPath, targetDurationSec, strategy = 'none', speed = 1, freezeExtraUs = 0) {
  if (!fs.existsSync(videoPath)) return videoPath
  if (strategy === 'none') return videoPath
  // 兜底截断：所有策略失败后的最终回退
  function fallbackTrim(cb) {
    execFile('ffmpeg', [
      '-y', '-i', videoPath,
      '-t', String(targetDurationSec),
      '-c', 'copy',
      videoPath.replace(/(\.\w+)$/, '_adj$1')
    ], { timeout: 30000 }, (err) => {
      if (err) { cb(videoPath); return }
      cb(videoPath.replace(/(\.\w+)$/, '_adj$1'))
    })
  }
  return new Promise((resolve) => {
    // 先获取视频时长
    execFile('ffprobe', [
      '-v', 'quiet', '-show_entries', 'format=duration',
      '-of', 'csv=p=0', videoPath
    ], (err, stdout) => {
-      if (err) { resolve(videoPath); return }
+      if (err) { fallbackTrim(resolve); return }
      const videoDur = parseFloat(stdout.trim())
-      if (!videoDur || videoDur <= 0 || videoDur <= targetDurationSec + 0.1) {
+      if (!videoDur || videoDur <= 0) { fallbackTrim(resolve); return }
        resolve(videoPath); return
      }
      const ratio = videoDur / targetDurationSec
      const outPath = videoPath.replace(/(\.\w+)$/, '_adj$1')
-      if (ratio <= 2) {
+      if (strategy === 'trim') {
        // 加速：setpts=PTS/speed, atempo=speed (音频变速)
        const speed = ratio.toFixed(3)
        const atempo = Math.min(speed, 2.0) // atempo 单次上限 2.0
        execFile('ffmpeg', [
          '-y', '-i', videoPath,
          '-filter_complex', `setpts=PTS/${speed}`,
          '-an',
          outPath
        ], { timeout: 30000 }, (err) => {
          if (err) { console.log(`     调速失败，使用原始视频: ${err.message}`); resolve(videoPath); return }
          console.log(`     调速: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s (${speed}x)`)
          resolve(outPath)
        })
      } else {
        // 截断：取前 targetDuration 秒
        execFile('ffmpeg', [
          '-y', '-i', videoPath,
          '-t', String(targetDurationSec),
          '-c', 'copy',
          outPath
        ], { timeout: 30000 }, (err) => {
-          if (err) { console.log(`     截断失败，使用原始视频: ${err.message}`); resolve(videoPath); return }
+          if (err) { console.log(`     截断失败: ${err.message}`); resolve(videoPath); return }
          console.log(`     截断: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s`)
          resolve(outPath)
        })
      } else if (strategy === 'speed_up') {
        const speedVal = speed.toFixed(3)
        execFile('ffmpeg', [
          '-y', '-i', videoPath,
          '-filter_complex', `setpts=PTS/${speedVal}`,
          '-an',
          outPath
        ], { timeout: 30000 }, (err) => {
          if (err) {
            console.log(`     加速失败，兜底截断: ${err.message}`)
            fallbackTrim(resolve)
            return
          }
          console.log(`     加速: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s (${speedVal}x)`)
          resolve(outPath)
        })
      } else if (strategy === 'slow_down') {
        const factor = (1 / speed).toFixed(3)
        execFile('ffmpeg', [
          '-y', '-i', videoPath,
          '-filter_complex', `setpts=PTS*${factor}`,
          '-an',
          outPath
        ], { timeout: 30000 }, (err) => {
          if (err) {
            console.log(`     放缓失败，兜底截断: ${err.message}`)
            fallbackTrim(resolve)
            return
          }
          console.log(`     放缓: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s (${speed.toFixed(2)}x speed)`)
          resolve(outPath)
        })
      } else if (strategy === 'freeze') {
        // 画面停顿：原速播放 + 最后一帧冻结补时长
        const freezeSec = freezeExtraUs / US
        execFile('ffmpeg', [
          '-y', '-i', videoPath,
          '-filter_complex', `tpad=stop=-1:stop_duration=${freezeSec.toFixed(3)}`,
          '-an',
          outPath
        ], { timeout: 30000 }, (err) => {
          if (err) {
            // 回退方案：截取最后一帧 → 生成冻结帧视频 → concat 拼接
            console.log(`     tpad freeze 失败，尝试 concat 方案: ${err.message}`)
            const lastFrame = videoPath.replace(/(\.\w+)$/, '_lastframe.png')
            const frozenVideo = videoPath.replace(/(\.\w+)$/, '_frozen.mp4')
            execFile('ffmpeg', [
              '-y', '-sseof', '-0.1', '-i', videoPath,
              '-frames:v', '1', lastFrame
            ], { timeout: 10000 }, (err2) => {
              if (err2) { console.log(`     concat 方案也失败，兜底截断`); fallbackTrim(resolve); return }
              execFile('ffmpeg', [
                '-y', '-loop', '1', '-i', lastFrame,
                '-t', String(freezeSec.toFixed(3)),
                '-pix_fmt', 'yuv420p',
                '-vf', 'scale=trunc(iw/2)*2:trunc(ih/2)*2',
                frozenVideo
              ], { timeout: 15000 }, (err3) => {
                if (err3) {
                  try { fs.unlinkSync(lastFrame) } catch (_) {}
                  console.log(`     冻结帧视频生成失败，兜底截断`)
                  fallbackTrim(resolve)
                  return
                }
                const concatList = path.join(path.dirname(videoPath), '_freeze_concat.txt')
                fs.writeFileSync(concatList, `file '${videoPath}'\nfile '${frozenVideo}'\n`)
                execFile('ffmpeg', [
                  '-y', '-f', 'concat', '-safe', '0', '-i', concatList,
                  '-c', 'copy', outPath
                ], { timeout: 30000 }, (err4) => {
                  try { fs.unlinkSync(lastFrame); fs.unlinkSync(frozenVideo); fs.unlinkSync(concatList) } catch (_) {}
                  if (err4) { console.log(`     拼接失败，兜底截断`); fallbackTrim(resolve); return }
                  console.log(`     画面停顿: ${videoDur.toFixed(1)}s + 冻结 ${freezeSec.toFixed(1)}s = ${targetDurationSec.toFixed(1)}s`)
                  resolve(outPath)
                })
              })
            })
            return
          }
          console.log(`     画面停顿: ${videoDur.toFixed(1)}s + 冻结 ${freezeSec.toFixed(1)}s = ${targetDurationSec.toFixed(1)}s`)
          resolve(outPath)
        })
      } else {
        resolve(videoPath)
      }
    })
  })
@@ -829,8 +1015,8 @@ async function addVideos(draftUrl, inputDir, items, timeline, width, height, tra
 async function batchUploadAudio(inputDir, items) {
  const urls = {}
  for (const item of items) {
-    // 上传 segments 中的每段音频
+    // 上传所有 segment 音频文件
-    if (item.segments && item.segments.length > 1) {
+    if (item.segments && item.segments.length > 0) {
      for (const seg of item.segments) {
        if (!seg.audio || seg.audio.startsWith('http') || urls[seg.audio]) continue
        const filePath = path.isAbsolute(seg.audio)
@@ -848,7 +1034,7 @@ async function batchUploadAudio(inputDir, items) {
        }
      }
    }
-    // 上传 item.audio（单段或 segments 的第一段）
+    // 上传 item.audio（向后兼容，segments[0].audio 通常等于此值）
    if (!item.audio || item.audio.startsWith('http')) {
      if (item.audio) urls[item.audio] = item.audio
      continue
@@ -893,24 +1079,29 @@ async function addVoiceover(draftUrl, inputDir, items, timeline, audioUrls = {})
  for (let i = 0; i < items.length; i++) {
    const item = items[i]
    const tl = timeline[i]
    const segments = item.segments && item.segments.length > 1 ? item.segments : null
-    if (segments) {
+    if (item.segments && item.segments.length > 0) {
-      // 多段音频：按 segment 逐段添加，使用精确时长
+      // 逐段添加，每段使用实际音频文件时长（不做比例分配，消除留白）
-      const slots = distributeSegments(tl, segments)
+      let currentTime = tl.start
-
+      for (let si = 0; si < item.segments.length; si++) {
-      for (const slot of slots) {
+        const seg = item.segments[si]
-        const audioUrl = resolveAudio(slot.audio)
+        const audioUrl = resolveAudio(seg.audio)
        const segDurUs = (seg.duration || 0) * US
        if (segDurUs <= 0) continue
        // 最后一段对齐 timeline 末尾，吃掉浮点误差
        const isLast = si === item.segments.length - 1
        const endTime = isLast ? tl.end : currentTime + segDurUs
        audioInfos.push({
          audio_url: audioUrl,
-          start: slot.start,
+          start: currentTime,
-          end: slot.end,
+          end: endTime,
-          duration: slot.duration,
+          duration: endTime - currentTime,
          volume: 1.0,
        })
        currentTime = endTime
      }
    } else if (item.audio) {
-      // 单段音频：用实际音频时长，不超过 timeline 时长
+      // 无 segments：用实际音频时长
      const audioUrl = resolveAudio(item.audio)
      const audioDurUs = item.audioDuration ? item.audioDuration * US : tl.duration
@@ -981,23 +1172,6 @@ function applyAnimationProps(cap, style = {}) {
  if (style.outAnimDuration) cap.out_animation_duration = style.outAnimDuration
 }
 // segments 按比例分配到时间线（DRY helper）
 function distributeSegments(tl, segments) {
  const totalSegDur = segments.reduce((sum, s) => sum + (s.duration || 0) * US, 0)
  if (totalSegDur <= 0) return []
  const tlDuration = tl.end - tl.start
  let currentTime = tl.start
  return segments.map((seg, idx) => {
    const segDurUs = Math.round((seg.duration || 0) * US)
    let duration = Math.round(tlDuration * (segDurUs / totalSegDur))
    if (idx === segments.length - 1) duration = tl.end - currentTime
    duration = Math.max(duration, 100000)
    const entry = { start: currentTime, end: currentTime + duration, duration, text: seg.text, audio: seg.audio }
    currentTime += duration
    return entry
  })
 }
 function loadAccountConfig(manifest) {
  const account = manifest.account
  if (!account) return {}
@@ -1093,17 +1267,19 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
    const tl = timeline[i]
    if (split) {
-      // 分句模式：优先用 segments（TTS 逐句生成的精确时长），回退到字数估算
+      // 分句模式：优先用 segments 精确时长（与 addVoiceover 同步），回退到字数估算
-      const segments = item.segments && item.segments.length > 1 ? item.segments : null
+      if (item.segments && item.segments.length > 0) {
-
+        let currentTime = tl.start
-      if (segments) {
+        for (let si = 0; si < item.segments.length; si++) {
-        // 精确模式：用 segments 的实际音频时长
+          const seg = item.segments[si]
-        const slots = distributeSegments(tl, segments)
+          const segDurUs = (seg.duration || 0) * US
-
+          if (segDurUs <= 0) continue
-        for (const slot of slots) {
+          const isLast = si === item.segments.length - 1
-          const cap = { start: slot.start, end: slot.end, text: slot.text }
+          const endTime = isLast ? tl.end : currentTime + segDurUs
          const cap = { start: currentTime, end: endTime, text: seg.text }
          applyAnimationProps(cap, animStyle)
          captions.push(cap)
          currentTime = endTime
        }
      } else {
        // 回退：字数权重估算
@@ -1246,7 +1422,6 @@ async function main() {
    console.log('选项:')
    console.log('  --mode images|videos     素材类型（默认 images）')
    console.log('  --format 9:16            画幅比例')
    console.log('  --duration 4             默认每段时长/秒（无TTS时的fallback，默认 4）')
    console.log('  --voiceover true|false   是否添加TTS配音轨道（默认 true）')
    console.log('  --subtitles true|false   是否添加字幕（默认 true）')
    console.log('  --split-captions true|false  分句字幕模式（默认 true，按标点切分）')
@@ -1256,12 +1431,12 @@ async function main() {
    console.log('  --apiKey <key>           云渲染 API Key（可选）')
    console.log('  --manifest <path>        manifest.json 路径')
    console.log('')
-    console.log('时间线模式:')
+    console.log('时间线规则:')
-    console.log('  manifest.json 中每段包含 audio + duration → TTS音频驱动时间线')
+    console.log('  图片模式: TTS 音频时长 = 画面时长，无音频则跳过')
-    console.log('  无 audio/duration → 按 --duration 固定时长')
+    console.log('  视频模式: TTS 为主轴，视频通过以下策略适配:')
-    console.log('')
+    console.log('    视频比音频长 → 加速(≤2x) 或 裁剪(>2x)')
-    console.log('manifest.json 示例（TTS驱动）:')
+    console.log('    视频比音频短 → 放缓(≥0.5x) 或 画面停顿(<0.5x)')
-    console.log('  {"items":[{"file":"1.png","text":"文案","audio":"seg_1.mp3","duration":3.5}]}')
+    console.log('    所有策略失败 → 兜底截断')
    console.log('')
    console.log('配置:')
    console.log('  请运行 node setup.js 生成配置')
--- a/.claude/skills/video-from-script/scripts/lib/cmd-confirm.js
+++ b/.claude/skills/video-from-script/scripts/lib/cmd-confirm.js
@@ -5,21 +5,26 @@
 const { loadManifest, saveManifest } = require('./pipeline-utils')
 function confirmManifest(options) {
-  const { manifest: manifestPath, all } = options
+  const { manifest: manifestPath, all, items: itemsStr } = options
  if (!manifestPath) {
    console.error('用法: pipeline.js confirm --manifest <path> --all')
    console.error('     pipeline.js confirm --manifest <path> --items 1,3,5')
    process.exit(1)
  }
-  if (!all) {
+  if (!all && !itemsStr) {
-    console.error('错误: 必须指定 --all')
+    console.error('错误: 必须指定 --all 或 --items <id列表>')
    process.exit(1)
  }
  const manifest = loadManifest(manifestPath)
  const targetIds = itemsStr
    ? new Set(itemsStr.split(',').map(s => parseInt(s.trim(), 10)).filter(n => !isNaN(n)))
    : null
  let count = 0
  for (const item of manifest.items) {
    if (targetIds && !targetIds.has(item.id)) continue
    if (item.file && item.status === 'done' && !item.confirmed) {
      item.confirmed = true
      count++
@@ -30,7 +35,8 @@ function confirmManifest(options) {
  const total = manifest.items.length
  const confirmed = manifest.items.filter(it => it.confirmed).length
-  console.log(`已确认: ${count} items（共 ${confirmed}/${total} 已确认）`)
+  const scope = targetIds ? `${Array.from(targetIds).join(',')}` : '全部'
  console.log(`已确认: ${count} items（范围: ${scope}，共 ${confirmed}/${total} 已确认）`)
 }
 module.exports = { confirmManifest }
--- a/.claude/skills/video-from-script/scripts/lib/cmd-init.js
+++ b/.claude/skills/video-from-script/scripts/lib/cmd-init.js
@@ -6,7 +6,7 @@
 const fs = require('fs')
 const path = require('path')
-const { loadAccountConfig, saveManifest, ensureDir, ACCOUNTS_DIR, SKILLS_DIR } = require('./pipeline-utils')
+const { loadAccountConfig, saveManifest, ensureDir, slugify, ACCOUNTS_DIR, SKILLS_DIR } = require('./pipeline-utils')
 function initManifest(options) {
  const { account: accountId, mode, items: itemsJson, itemsFile } = options
@@ -40,7 +40,8 @@ function initManifest(options) {
  }
  // 校验必填字段
-  const requiredFields = ['shotDesc', 'script', 'imagePrompt']
+  const requiredFields = ['shotDesc', 'script']
  const optionalFields = ['imagePrompt', 'videoPrompt', 'lastFramePrompt']
  const resolvedMode = mode || 'single'
  for (let i = 0; i < rawItems.length; i++) {
@@ -52,8 +53,7 @@ function initManifest(options) {
      }
    }
    if (resolvedMode === 'framePair' && !item.lastFramePrompt) {
-      console.error(`错误: 首尾帧模式 items[${i}] 缺少 "lastFramePrompt"（imagePrompt 作为第一帧）`)
+      delete item.lastFramePrompt  // 首尾帧模式 Step 2-A 补充
      process.exit(1)
    }
  }
@@ -68,9 +68,11 @@ function initManifest(options) {
  // 构建 items
  const items = rawItems.map((raw, i) => {
    const slug = slugify(raw.shotDesc || raw.script || `scene_${i + 1}`)
    const item = {
      id: i + 1,
      status: 'pending',
      file: `images/scene_${String(i + 1).padStart(2, '0')}_${slug}.jpeg`,
      shotDesc: raw.shotDesc || '',
      script: raw.script || '',
      duration: raw.duration || 5,
@@ -129,7 +131,13 @@ function initManifest(options) {
  console.log(`  画幅: ${manifest.format}, 模式: ${manifest.mode}`)
  console.log(`  Items: ${items.length}`)
  console.log(`  参考图: ${references.length}`)
-  if (items.some(it => !it.videoPrompt)) {
+  if (items.some(it => !it.imagePrompt)) {
    console.log(`  ⚠ ${items.filter(it => !it.imagePrompt).length} 个 item 缺少 imagePrompt，请运行 Step 2-A（图片提示词）补充`)
  }
  if (resolvedMode === 'framePair' && items.some(it => !it.lastFramePrompt)) {
    console.log(`  ⚠ ${items.filter(it => !it.lastFramePrompt).length} 个 item 缺少 lastFramePrompt，请运行 Step 2-A 补充`)
  }
  if (items.some(it => !it.videoPrompt && resolvedMode !== 'framePair')) {
    console.log(`  ⚠ ${items.filter(it => !it.videoPrompt).length} 个 item 缺少 videoPrompt，生视频阶段将跳过`)
  }
  console.log()
--- a/.claude/skills/video-from-script/scripts/lib/cmd-validate.js
+++ b/.claude/skills/video-from-script/scripts/lib/cmd-validate.js
@@ -41,6 +41,9 @@ function validateManifest(manifestPath) {
      if (item.status && !['pending', 'generating', 'done', 'failed'].includes(item.status)) {
        issues.push(`${prefix} status 无效: ${item.status}`)
      }
      if (item.status === 'done' && !item.file && !item.video && !item.url) {
        issues.push(`${prefix} status=done 但缺少 file/video/url（素材路径）`)
      }
    })
  }
--- a/.claude/skills/video-from-script/scripts/lib/phase-assemble.js
+++ b/.claude/skills/video-from-script/scripts/lib/phase-assemble.js
@@ -15,6 +15,14 @@ async function phaseAssemble(manifest, manifestPath, options) {
  const hasVideos = videoItems.length > 0
  const mode = hasVideos ? 'videos' : 'images'
  // 前置校验：图片模式下检查 file 字段
  if (mode === 'images') {
    const missingFile = manifest.items.filter(it => !it.file)
    if (missingFile.length > 0) {
      throw new Error(`${missingFile.length} 个 item 缺少 file 字段（id: ${missingFile.map(it => it.id).join(', ')}），请先运行 images 阶段生成图片`)
    }
  }
  const assembleArgs = {
    input: dir,
    manifest: manifestPath,
@@ -22,7 +30,6 @@ async function phaseAssemble(manifest, manifestPath, options) {
    format: manifest.format || accountConfig.defaultFormat || '9:16',
    subtitles: mode === 'images' ? 'true' : 'false',
    voiceover: manifest.items.some(it => it.audio) ? 'true' : 'false',
    duration: '4',
    animation: capcutConfig.animation || '渐显+放大',
  }
--- a/.claude/skills/video-from-script/scripts/lib/phase-images.js
+++ b/.claude/skills/video-from-script/scripts/lib/phase-images.js
@@ -17,7 +17,8 @@ async function phaseImages(manifest, manifestPath, options) {
  ensureDir(imagesDir)
  const items = manifest.items.filter(it =>
-    (!it.status || it.status === 'pending' || it.status === 'generating') && it.imagePrompt
+    ((!it.status || it.status === 'pending' || it.status === 'generating') && it.imagePrompt) ||
    (it.status === 'done' && manifest.mode === 'framePair' && it.file && it.lastFramePrompt && !it.lastFrame)
  )
  if (items.length === 0) { log('images', '无待处理 item，跳过'); return }
@@ -45,6 +46,14 @@ async function phaseImages(manifest, manifestPath, options) {
          item.status = 'generating'
          saveManifest(manifestPath, manifest)
          // 仅补 lastFrame：首帧已存在，跳过首帧生成
          if (item.file && manifest.mode === 'framePair' && item.lastFramePrompt && !item.lastFrame) {
            log('images', `[${idx}] 补生成 lastFrame（首帧已有: ${item.file}）`)
            await generateLastFrame(item, idx, manifest, dir, imagesDir, model, ratio, manifestPath)
            saveManifest(manifestPath, manifest)
            return { ok: true }
          }
          let result
          if (model === 'gemini') {
            result = await generateGemini(item, idx, dir, imagesDir, ratio, refs)
--- a/.claude/skills/video-from-script/scripts/lib/phase-tts.js
+++ b/.claude/skills/video-from-script/scripts/lib/phase-tts.js
@@ -2,7 +2,8 @@
 * Phase: tts — 语音合成（逐句分句生成）
 *
 * 将每个 item 的 script 按标点切分为短句，每句单独生成 TTS 音频。
- * 结果写入 item.segments[]，实现字幕与语音精确对齐。
+ * 统一写入 item.segments[]，单句时数组仅 1 个元素。
 * item.audio 指向第一段，item.audioDuration 为累计时长。
 */
 const path = require('path')
@@ -29,47 +30,32 @@ async function phaseTts(manifest, manifestPath, options = {}) {
    try {
      const sentences = splitTextIntoSentences(fullText)
      const segments = []
      let totalDuration = 0
-      if (sentences.length <= 1) {
+      for (let j = 0; j < sentences.length; j++) {
-        // 单句：不需要 segments，走原逻辑
+        const sentence = sentences[j]
-        const { filePath, duration } = await synthesize(fullText, {
+        const segId = `${item.id || idx}_${j + 1}`
        const { filePath, duration } = await synthesize(sentence, {
          outputDir: audioDir,
-          id: item.id || idx,
+          id: segId,
          voice: manifest.ttsVoice || undefined,
          instruction: manifest.ttsInstruction || undefined,
          rate: manifest.ttsRate || undefined,
        })
-        item.audio = path.relative(dir, filePath).replace(/\\/g, '/')
+        segments.push({
-        item.audioDuration = Math.round(duration * 1000) / 1000
+          text: sentence,
-        log('tts', `[${idx}/${items.length}] ${duration.toFixed(1)}s: ${fullText.substring(0, 30)}...`)
+          audio: path.relative(dir, filePath).replace(/\\/g, '/'),
-      } else {
+          duration: Math.round(duration * 1000) / 1000,
-        // 多句：逐句生成，写入 segments
+        })
-        const segments = []
+        totalDuration += duration
        let totalDuration = 0
        for (let j = 0; j < sentences.length; j++) {
          const sentence = sentences[j]
          const segId = `${item.id || idx}_${j + 1}`
          const { filePath, duration } = await synthesize(sentence, {
            outputDir: audioDir,
            id: segId,
            voice: manifest.ttsVoice || undefined,
            instruction: manifest.ttsInstruction || undefined,
            rate: manifest.ttsRate || undefined,
          })
          segments.push({
            text: sentence,
            audio: path.relative(dir, filePath).replace(/\\/g, '/'),
            duration: Math.round(duration * 1000) / 1000,
          })
          totalDuration += duration
        }
        item.segments = segments
        item.audio = segments[0].audio
        item.audioDuration = Math.round(totalDuration * 1000) / 1000
        log('tts', `[${idx}/${items.length}] ${totalDuration.toFixed(1)}s (${segments.length}句): ${fullText.substring(0, 30)}...`)
      }
      // 统一使用 segments 数组（单句 = 1 元素，多句 = N 元素）
      item.segments = segments
      item.audio = segments[0].audio
      item.audioDuration = Math.round(totalDuration * 1000) / 1000
      log('tts', `[${idx}/${items.length}] ${totalDuration.toFixed(1)}s (${segments.length}句): ${fullText.substring(0, 30)}...`)
    } catch (err) {
      item.status = 'failed'
      item.error = `TTS失败: ${err.message}`
--- a/.claude/skills/video-from-script/scripts/lib/phase-upload.js
+++ b/.claude/skills/video-from-script/scripts/lib/phase-upload.js
@@ -1,7 +1,7 @@
 /**
 * Phase: upload — OSS 上传
 *
- * 将生成的图片（含首尾帧）上传到 OSS，回写 url
+ * 将图片（含首尾帧）和视频上传到 OSS，回写 url / videoUrl
 */
 const path = require('path')
@@ -11,35 +11,64 @@ async function phaseUpload(manifest, manifestPath) {
  const dir = getManifestDir(manifestPath)
  const { uploadFile } = require('../oss-upload')
-  const items = manifest.items.filter(it =>
+  // 图片（含首尾帧 first frame）
  const imageItems = manifest.items.filter(it =>
    it.status === 'done' && it.file && !it.url
  )
-  if (items.length === 0) { log('upload', '无待上传 item，跳过'); return }
+  // 视频
  const videoItems = manifest.items.filter(it =>
    it.status === 'done' && it.video && !it.videoUrl
  )
-  log('upload', `共 ${items.length} 个文件`)
+  if (imageItems.length === 0 && videoItems.length === 0) {
    log('upload', '无待上传文件，跳过')
    return
  }
-  for (let i = 0; i < items.length; i++) {
+  // 上传图片
-    const item = items[i]
+  if (imageItems.length > 0) {
-    const filePath = path.resolve(dir, item.file)
+    log('upload', `图片: ${imageItems.length} 个`)
-    try {
+    for (let i = 0; i < imageItems.length; i++) {
-      const { url } = await uploadFile(filePath)
+      const item = imageItems[i]
-      item.url = url
+      const filePath = path.resolve(dir, item.file)
      log('upload', `[${i + 1}/${items.length}] ${item.file} → ${url.substring(0, 60)}...`)
    } catch (err) {
      item.error = `上传失败: ${err.message}`
      log('upload', `[${i + 1}/${items.length}] 失败: ${err.message}`)
    }
    if (item.url && item.lastFrame && !item.lastFrameUrl) {
      const lastPath = path.resolve(dir, item.lastFrame)
      try {
-        const { url } = await uploadFile(lastPath)
+        const { url } = await uploadFile(filePath)
-        item.lastFrameUrl = url
+        item.url = url
-        log('upload', `[${i + 1}/${items.length}] lastFrame → OK`)
+        log('upload', `  [${i + 1}/${imageItems.length}] ${item.file} → OK`)
      } catch (err) {
-        log('upload', `[${i + 1}/${items.length}] lastFrame 上传失败: ${err.message}`)
+        item.error = `上传失败: ${err.message}`
        log('upload', `  [${i + 1}/${imageItems.length}] 失败: ${err.message}`)
      }
      // 首尾帧模式：上传 lastFrame
      if (item.url && item.lastFrame && !item.lastFrameUrl) {
        const lastPath = path.resolve(dir, item.lastFrame)
        try {
          const { url } = await uploadFile(lastPath)
          item.lastFrameUrl = url
          log('upload', `  [${i + 1}/${imageItems.length}] lastFrame → OK`)
        } catch (err) {
          log('upload', `  [${i + 1}/${imageItems.length}] lastFrame 上传失败: ${err.message}`)
        }
      }
      saveManifest(manifestPath, manifest)
    }
  }
  // 上传视频
  if (videoItems.length > 0) {
    log('upload', `视频: ${videoItems.length} 个`)
    for (let i = 0; i < videoItems.length; i++) {
      const item = videoItems[i]
      const videoPath = path.resolve(dir, item.video)
      try {
        const { url } = await uploadFile(videoPath)
        item.videoUrl = url
        log('upload', `  [${i + 1}/${videoItems.length}] ${item.video} → OK`)
      } catch (err) {
        log('upload', `  [${i + 1}/${videoItems.length}] 失败: ${err.message}`)
      }
      saveManifest(manifestPath, manifest)
    }
    saveManifest(manifestPath, manifest)
  }
 }
--- a/.claude/skills/video-from-script/scripts/pipeline.js
+++ b/.claude/skills/video-from-script/scripts/pipeline.js
@@ -112,13 +112,23 @@ function applyRetryFailed(manifest, phases) {
  for (const item of manifest.items) {
    if (item.status === 'failed' || item.status === 'partial') {
      if (item.url && item.videoPrompt && !item.video) {
        // 图片已上传但视频未生成 → 直接重试视频阶段
        item.status = 'done'
        item.error = ''
        resetCount++
      } else if (!item.url && item.imagePrompt) {
-        item.status = 'pending'
+        // 图片未上传 → 重试图片阶段
-        item.error = ''
+        // 如果首帧已存在但 lastFrame 失败，只重置 lastFrame 相关
-        resetCount++
+        if (item.file && manifest.mode === 'framePair' && !item.lastFrame) {
          item.status = 'done'  // 保留首帧，只补 lastFrame
          item.error = ''
          resetCount++
        } else {
          item.status = 'pending'
          item.error = ''
          delete item.file  // 清除旧文件引用，避免重复
          resetCount++
        }
      }
    }
  }
@@ -128,7 +138,7 @@ function applyRetryFailed(manifest, phases) {
    }
  }
  if (phases.includes('images')) {
-    if (manifest.items.some(it => !it.status || it.status === 'pending')) {
+    if (manifest.items.some(it => (!it.status || it.status === 'pending') || (it.status === 'done' && manifest.mode === 'framePair' && !it.lastFrame))) {
      manifest.pipeline.phases.images = 'pending'
    }
  }
@@ -159,7 +169,6 @@ function parseArgs(argv) {
    else if (argv[i] === '--image-model' && argv[i + 1]) args.imageModel = argv[++i]
    else if (argv[i] === '--video-model' && argv[i + 1]) args.videoModel = argv[++i]
    else if (argv[i] === '--references' && argv[i + 1]) args.references = argv[++i]
    else if (argv[i] === '--style' && argv[i + 1]) args.style = argv[++i]
    else if (argv[i] === '--all') args.all = true
    else if (!args.command) args.command = argv[i]
  }
@@ -219,6 +228,7 @@ async function main() {
  console.log('  pipeline.js init --account <id> --mode <single|framePair> --items <JSON> [--items-file <path>] [--image-model gemini|mj] [--video-model veo3-fast|grok|kling] [--format 9:16]')
  console.log('  pipeline.js validate --manifest <path>')
  console.log('  pipeline.js confirm --manifest <path> --all')
  console.log('  pipeline.js confirm --manifest <path> --items 1,3,5')
  console.log('  pipeline.js run --manifest <path> [--account id] [--phase p1,p2] [--resume] [--retry-failed]')
  console.log('  pipeline.js status --manifest <path>')
  console.log('')
--- a/.gitignore
+++ b/.gitignore
@@ -2,7 +2,7 @@
 node_modules/
-
+config.json
 # Local settings
 .claude/settings.local.json
--- a/accounts/_template/prompts/通用图片.md
+++ b/accounts/_template/prompts/通用图片.md
@@ -2,9 +2,7 @@
 ## 一、角色定义
-你是一位专精图片生成模型的提示词工程师，具备深厚的视觉叙事和光影设计能力。
+你是一位拥有 15 年经验的电影摄影指导（DP），擅长将文字分镜转化为高表现力的视觉起始帧。你不仅关注“画了什么”，更关注“空间叙述”与“光影秩序”。
 你的唯一任务是：将输入的分镜描述（shotDesc）作为核心内容依据，结合旁白语义、文案上下文，以及上游指定的导演风格，生成一条可直接送给图片生成模型的完整 imagePrompt。
 > **重要前提：** 你生成的图片是下游视频片段的起始帧。构图和姿态必须是「即将发生」的瞬间，而非「已完成」的状态。