refactor(video-pipeline): 移除 segments 机制，改为整段音频合成

移除 TTS 阶段逐句切分及 segments 数组逻辑，统一为整段音频合成。 CapCut 字幕切分由组装阶段按字符比例分配，简化音频上传、时间线构建和字幕生成流程，减少冗余处理分支。
2026-05-02 02:31:55 +08:00
parent ac753ef367
commit 6097a809bf
9 changed files with 95 additions and 244 deletions
--- a/.claude/skills/video-from-script/references/account-creation.md
+++ b/.claude/skills/video-from-script/references/account-creation.md
@@ -49,18 +49,11 @@ digraph creation_flow {
 | 3 | 核心内容方向？如：历史权谋、科技解说、情感故事、美食文化 | ✅ | 分镜.md → 角色定义 + 账号内容理解.核心方向 |
 | 4 | 目标受众？如：30岁男性、18-25岁女性 | ✅ | 分镜.md → 账号内容理解.目标受众 |
 | 5 | 内容气质？用 2-3 个关键词描述，如：冷峻洞察、温暖治愈、犀利反讽 | ✅ | 分镜.md → 账号内容理解.内容气质 |
-| 5.5 | Hook 策略偏好？（选填） | ❌ | 分镜.md → 3秒钩子规则增强 |
-| | A. 结论前置（默认）：直接亮核心观点 | | |
-| | B. 认知冲突：一句话打破常识，制造"凭什么" | | |
-| | C. 身份挑衅：点中受众身份焦虑 | | |
-| | D. 数据震惊：用震撼数字开场 | | |
-| | E. 反转悬念：设一个反直觉的悬念 | | |
 | 5.6 | 目标情绪回路？（选填）如：好奇→震惊→领悟，或平静→压迫→释放 | ❌ | 分镜.md → 账号内容理解.情绪回路 |

 **注入规则**：
 - 角色定义改为"专精{Q3}类口播文案转化为{Q6}画面的分镜导演"
 - 新增「账号内容理解」节（Q3+Q4+Q5+Q5.6，仅供子 Agent 理解上下文，不输出到分镜表）
- 如有 Q5.5，在「3秒钩子规则」中标注账号默认 Hook 策略

 ---

@@ -220,7 +213,6 @@ Agent 在汇总确认前，先做以下快速自检。任何一项为 ❌ 时建
 |--------|---------|
 | 差异化定位 | Q2 描述能让用户说清"为什么看这个号而不是别的" |
 | 情绪价值 | Q5.6 有明确的情绪回路，不是"好看"而是"看完有感觉" |
-| 前3秒策略 | Q5.5 选了明确的 Hook 模式，不是"先铺垫再讲" |
 | 视觉记忆点 | Q7+Q8 色彩/画风能在信息流中一眼认出 |

 自检结果展示给用户：全部 ✅ → 进入汇总确认；有 ❌ → 建议补充后再继续（用户可强制跳过）。
@@ -243,7 +235,6 @@ Agent 在汇总确认前，先做以下快速自检。任何一项为 ❌ 时建
 ### 维度 3：内容气质
 - 核心方向：{Q3}
 - 内容气质：{Q5}
- Hook 策略：{Q5.5 或"未指定，使用通用钩子规则"}
 - 情绪回路：{Q5.6 或"未指定"}

 ### 维度 4-6：视觉基调 + 画风 + 色彩
@@ -353,7 +344,6 @@ digraph injection {
   - 读取 `_template/prompts/通用分镜.md`
   - 在角色定义中注入 Q3 内容方向
   - 新增「账号内容理解」节（Q3+Q4+Q5+Q5.6 情绪回路）
-   - 增强「3秒钩子规则」节：如有 Q5.5，标注账号默认 Hook 策略
   - 新增「宏观视觉风格方向」节（Q6+Q7+推导）
   - 保留通用骨架：切割规则、导演构图词库、shotDesc 写法规范、输入输出格式、质量自检

--- a/.claude/skills/video-from-script/scripts/capcut_assemble.js
+++ b/.claude/skills/video-from-script/scripts/capcut_assemble.js
@@ -65,24 +65,6 @@ async function batchUploadToOSS(inputDir, files, concurrency = 3) {
 async function batchUploadAudio(inputDir, items) {
  const urls = {}
  for (const item of items) {
-    if (item.segments && item.segments.length > 0) {
-      for (const seg of item.segments) {
-        if (!seg.audio || seg.audio.startsWith('http') || urls[seg.audio]) continue
-        const filePath = path.isAbsolute(seg.audio)
-          ? seg.audio
-          : path.resolve(inputDir, seg.audio)
-        if (!fs.existsSync(filePath)) {
-          console.error(`   音频文件不存在: ${filePath}`)
-          continue
-        }
-        try {
-          urls[seg.audio] = await uploadToOSS(filePath)
-          console.log(`   上传: ${path.basename(filePath)} -> OK`)
-        } catch (err) {
-          console.error(`   上传失败: ${path.basename(filePath)} - ${err.message}`)
-        }
-      }
-    }
    if (!item.audio || item.audio.startsWith('http')) {
      if (item.audio) urls[item.audio] = item.audio
      continue
@@ -174,17 +156,7 @@ async function assemble(args) {
  // ffprobe 测量实际时长
  let audioMeasured = 0, videoMeasured = 0
  for (const item of items) {
-    if (item.segments && item.segments.length > 0) {
-      for (const seg of item.segments) {
-        if (!seg.audio || seg.audio.startsWith('http')) continue
-        const audioPath = path.isAbsolute(seg.audio)
-          ? seg.audio
-          : path.resolve(inputDir, seg.audio)
-        if (!fs.existsSync(audioPath)) continue
-        const actualDur = await getAudioDurationSec(audioPath)
-        if (actualDur != null) { seg.duration = actualDur; audioMeasured++ }
-      }
-    } else if (item.audio && !item.audio.startsWith('http')) {
+    if (item.audio && !item.audio.startsWith('http')) {
      const audioPath = path.isAbsolute(item.audio)
        ? item.audio
        : path.resolve(inputDir, item.audio)
@@ -216,9 +188,7 @@ async function assemble(args) {
    const item = items[i]
    const tl = timeline[i]
    if (tl.skip) { console.log(`  [${i + 1}] 跳过（无音频）`); continue }
-    const audioDur = item.segments
-      ? item.segments.reduce((s, seg) => s + (seg.duration || 0), 0)
-      : (item.audioDuration || 0)
+    const audioDur = item.audioDuration || 0
    const slotDur = tl.duration / US
    const diff = slotDur - audioDur
    const videoDur = (item.videoDuration || 0)
@@ -341,14 +311,6 @@ async function assemble(args) {
            item.audio = audioUrls[item.audio]
            changed = true
          }
-          if (item.segments) {
-            for (const seg of item.segments) {
-              if (seg.audio && audioUrls[seg.audio]) {
-                seg.audio = audioUrls[seg.audio]
-                changed = true
-              }
-            }
-          }
        }
        if (changed) saveManifest(manifestFile, manifest)
      }
--- a/.claude/skills/video-from-script/scripts/lib/capcut-timeline.js
+++ b/.claude/skills/video-from-script/scripts/lib/capcut-timeline.js
@@ -23,12 +23,7 @@ const { US } = require('./capcut-api')
 function buildTimeline(items) {
  let offset = 0
  return items.map(item => {
-    let audioDur
-    if (item.segments && item.segments.length > 0) {
-      audioDur = item.segments.reduce((sum, s) => sum + (s.duration || 0), 0) * US
-    } else {
-      audioDur = (item.audioDuration != null) ? item.audioDuration * US : 0
-    }
+    const audioDur = (item.audioDuration != null) ? item.audioDuration * US : 0
    const videoDur = (item.videoDuration != null) ? item.videoDuration * US : 0
    const hasVideo = !!(item.video || item.videoUrl || item.url)

--- a/.claude/skills/video-from-script/scripts/lib/capcut-tracks.js
+++ b/.claude/skills/video-from-script/scripts/lib/capcut-tracks.js
@@ -308,7 +308,7 @@ async function addVideos(draftUrl, inputDir, items, timeline, width, height, tra
 // ============================================================================

 async function addVoiceover(draftUrl, inputDir, items, timeline, audioUrls = {}) {
-  const audioItems = items.filter(item => item.audio || (item.segments && item.segments.length > 0))
+  const audioItems = items.filter(item => item.audio)
  if (audioItems.length === 0) {
    console.log('   无 TTS 音频文件，跳过')
    return
@@ -325,25 +325,7 @@ async function addVoiceover(draftUrl, inputDir, items, timeline, audioUrls = {})
    const item = items[i]
    const tl = timeline[i]

-    if (item.segments && item.segments.length > 0) {
-      let currentTime = tl.start
-      for (let si = 0; si < item.segments.length; si++) {
-        const seg = item.segments[si]
-        const audioUrl = resolveAudio(seg.audio)
-        const segDurUs = (seg.duration || 0) * US
-        if (segDurUs <= 0) continue
-        const isLast = si === item.segments.length - 1
-        const endTime = isLast ? tl.end : currentTime + segDurUs
-        audioInfos.push({
-          audio_url: audioUrl,
-          start: currentTime,
-          end: endTime,
-          duration: endTime - currentTime,
-          volume: 1.0,
-        })
-        currentTime = endTime
-      }
-    } else if (item.audio) {
+    if (item.audio) {
      const audioUrl = resolveAudio(item.audio)
      const audioDurUs = item.audioDuration ? item.audioDuration * US : tl.duration

@@ -421,20 +403,6 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
    const tl = timeline[i]

    if (split) {
-      if (item.segments && item.segments.length > 0) {
-        let currentTime = tl.start
-        for (let si = 0; si < item.segments.length; si++) {
-          const seg = item.segments[si]
-          const segDurUs = (seg.duration || 0) * US
-          if (segDurUs <= 0) continue
-          const isLast = si === item.segments.length - 1
-          const endTime = isLast ? tl.end : currentTime + segDurUs
-          const cap = { start: currentTime, end: endTime, text: seg.text }
-          applyAnimationProps(cap, animStyle)
-          captions.push(cap)
-          currentTime = endTime
-        }
-      } else {
      const sentences = splitTextIntoSentences(text)
      if (sentences.length === 0) continue

@@ -462,7 +430,6 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
        captions.push(cap)
        currentTime += duration
      })
-      }
    } else {
      const cap = {
        start: tl.start,
--- a/.claude/skills/video-from-script/scripts/lib/phase-assemble.js
+++ b/.claude/skills/video-from-script/scripts/lib/phase-assemble.js
@@ -28,7 +28,7 @@ async function phaseAssemble(manifest, manifestPath, options) {
    manifest: manifestPath,
    mode,
    format: manifest.format || accountConfig.defaultFormat || '9:16',
-    subtitles: mode === 'images' ? 'true' : 'false',
+    subtitles: 'true',
    voiceover: manifest.items.some(it => it.audio) ? 'true' : 'false',
    animation: capcutConfig.animation || '渐显+放大',
  }
--- a/.claude/skills/video-from-script/scripts/lib/phase-tts.js
+++ b/.claude/skills/video-from-script/scripts/lib/phase-tts.js
@@ -1,13 +1,13 @@
 /**
- * Phase: tts — 语音合成（逐句分句生成）
+ * Phase: tts — 语音合成（整段合成）
 *
- * 将每个 item 的 script 按标点切分为短句，每句单独生成 TTS 音频。
- * 统一写入 item.segments[]，单句时数组仅 1 个元素。
- * item.audio 指向第一段，item.audioDuration 为累计时长。
+ * 每个 item 的 script 整段合成一个音频文件，保留自然语调。
+ * item.audio 指向完整音频，item.audioDuration 为总时长。
+ * 字幕切分由组装阶段按字符比例分配，不在 TTS 阶段处理。
 */

 const path = require('path')
-const { saveManifest, ensureDir, log, getManifestDir, splitTextIntoSentences } = require('./pipeline-utils')
+const { saveManifest, ensureDir, log, getManifestDir } = require('./pipeline-utils')

 async function phaseTts(manifest, manifestPath, options = {}) {
  const dir = getManifestDir(manifestPath)
@@ -29,33 +29,18 @@ async function phaseTts(manifest, manifestPath, options = {}) {
    const fullText = item.script || item.text

    try {
-      const sentences = splitTextIntoSentences(fullText)
-      const segments = []
-      let totalDuration = 0
-
-      for (let j = 0; j < sentences.length; j++) {
-        const sentence = sentences[j]
-        const segId = `${item.id || idx}_${j + 1}`
-        const { filePath, duration } = await synthesize(sentence, {
+      const { filePath, duration } = await synthesize(fullText, {
        outputDir: audioDir,
-          id: segId,
+        id: String(item.id || idx),
        voice: manifest.ttsVoice || undefined,
        instruction: manifest.ttsInstruction || undefined,
        rate: manifest.ttsRate || undefined,
      })
-        segments.push({
-          text: sentence,
-          audio: path.relative(dir, filePath).replace(/\\/g, '/'),
-          duration: Math.round(duration * 1000) / 1000,
-        })
-        totalDuration += duration
-      }

-      // 统一使用 segments 数组（单句 = 1 元素，多句 = N 元素）
-      item.segments = segments
-      item.audio = segments[0].audio
-      item.audioDuration = Math.round(totalDuration * 1000) / 1000
-      log('tts', `[${idx}/${items.length}] ${totalDuration.toFixed(1)}s (${segments.length}句): ${fullText.substring(0, 30)}...`)
+      const totalDuration = Math.round(duration * 1000) / 1000
+      item.audio = path.relative(dir, filePath).replace(/\\/g, '/')
+      item.audioDuration = totalDuration
+      log('tts', `[${idx}/${items.length}] ${totalDuration.toFixed(1)}s: ${fullText.substring(0, 30)}...`)
    } catch (err) {
      item.status = 'failed'
      item.error = `TTS失败: ${err.message}`
--- a/.claude/skills/video-from-script/scripts/lib/pipeline-utils.js
+++ b/.claude/skills/video-from-script/scripts/lib/pipeline-utils.js
@@ -165,8 +165,8 @@ function getManifestDir(manifestPath) {
 // ============================================================================

 function splitTextIntoSentences(text) {
-  const sentenceEnders = /[。！？；]/
-  const clauseEnders = /[，：]/
+  // 在句号、感叹号、分号、逗号处断句——它们是口播语音的天然呼吸点。
+  const sentenceEnders = /[。！；，]/

  const sentences = []
  let current = ''
@@ -175,16 +175,13 @@ function splitTextIntoSentences(text) {
    current += char

    if (sentenceEnders.test(char)) {
-      sentences.push(current.trim().replace(/[。！？；，：、]/g, ''))
-      current = ''
-    } else if (clauseEnders.test(char) && current.length > 8) {
-      sentences.push(current.trim().replace(/[。！？；，：、]/g, ''))
+      sentences.push(current.trim().replace(/[。！；，：？、——…]/g, ''))
      current = ''
    }
  }

  if (current.trim()) {
-    sentences.push(current.trim().replace(/[。！？；，：、]/g, ''))
+    sentences.push(current.trim().replace(/[。！；，：？、——…]/g, ''))
  }

  return sentences
--- a/accounts/_template/prompts/通用分镜.md
+++ b/accounts/_template/prompts/通用分镜.md
@@ -62,29 +62,11 @@ source outside the frame begins its slow rotation

 → 有明确运动趋势：头正在转向、影子正在拉长——视频模型能推断运动方向。

-## 三、3秒钩子规则（Shot 1 强制）
-
-短视频前 3 秒决定用户是否留下。**Shot 1 必须是钩子，不是铺垫。**
-
-| 策略 | 说明 |
-|------|------|
-| **结论前置** | 从文案核心金句提取最冲击的结论，直接放在开头 |
-| **认知冲突** | 一句话打破常识，制造"凭什么"的好奇心 |
-| **身份挑衅** | 直接点中受众身份焦虑 |
-
-钩子规范：
- 画面有视觉冲击力，不用背影/空走廊等铺垫
- 文案 ≤ 20 字，一句话讲完
- 时长 4-5 秒，短狠快
- 禁止设问式开头（"大多数人..."）、禁止超 20 字、禁止纯铺垫画面
-
-钩子后 Shot 2 负责引入正文，Shot 3 起按原文顺序展开。
-
-## 四、切割规则
+## 三、切割规则

 切割分两层：第一层按语义场景做宏观切分（两种模式通用），第二层按气口做微观切分（视频成片专用）。

-### 4.1 第一层：语义场景切割（两种模式通用）
+### 3.1 第一层：语义场景切割（两种模式通用）

 以语义场景转折为切割依据，不按句号机械切割。

@@ -96,7 +78,7 @@ source outside the frame begins its slow rotation
 | 节奏重音 | 强调句、停顿感强、关键意象出现 |
 | 语义完整（仅图文） | 该段表达一个完整观点或例子 |

-### 4.2 第二层：气口切割（视频成片专用）
+### 3.2 第二层：气口切割（视频成片专用）

 **视频成片在完成语义场景切割后，必须在每个语义场景内部进行第二轮气口切割。**

@@ -137,27 +119,27 @@ source outside the frame begins its slow rotation
 - ❌ 丢弃原文的论证、例子、细节来"节省字数"
 - ❌ 跨语义场景合并——气口切割只在同一个语义场景内部进行

-### 4.3 字数上限速查
+### 3.3 字数上限速查

 | 模式 | 每段字数 | 说明 |
 |------|---------|------|
 | 图文成片 | 50 字左右 | 一帧讲透一个观点 |
 | 视频成片 | 8–22 字 | 气口自然长度，长句必须拆为连续 Shot |

-### 4.4 时长控制
+### 3.4 时长控制

 - **图文成片：** 每条 Shot 4-10 秒，跟随旁白节奏，完整表达一个观点
 - **视频成片：** 每条 Shot 3-7 秒，目标 5 秒，匹配视频片段长度
 - **总时长校验：** 所有 duration 之和 = 文案朗读总时长

-## 五、导演构图语言词库（分镜层专用）
+## 四、导演构图语言词库（分镜层专用）

 > 本层只负责：构图逻辑 + 画面内容设计 + 视角选择
 > 光影渲染由图片提示词处理，运动节奏由视频提示词处理

 每个 Shot 选定一位导演作为构图参考，写入 `directorRef` 字段向下游透传。下游图片和视频提示词根据此字段执行各自层的风格，不重新选导演。

-### 5.1 昆汀·塔伦蒂诺（Tarantino）
+### 4.1 昆汀·塔伦蒂诺（Tarantino）

 **构图核心：** 身体局部主导叙事；对话即权力博弈；平静表面下的极度张力

@@ -181,7 +163,7 @@ room has not yet realized is coming

 **适合选用场景：** 微行为解码 / 潜台词型文案 / 局部细节承载叙事

-### 5.2 北野武（Kitano）
+### 4.2 北野武（Kitano）

 **构图核心：** 静止即叙事；留白承载重量；人物与空间的关系即情绪

@@ -206,7 +188,7 @@ His body has not moved. Neither has his decision.

 **适合选用场景：** 孤独/等待/沉默型文案 / 收尾 Shot / 留白叙事

-### 5.3 大卫·芬奇（Fincher）
+### 4.3 大卫·芬奇（Fincher）

 **构图核心：** 精确的控制感；对称中的破坏；冷静凝视是最深的压迫

@@ -231,13 +213,13 @@ The balance of power broke the same moment the geometry did.

 **适合选用场景：** 规律揭示型文案 / 解剖者视角 / 关系结构拆解

-## 六、shotDesc 写法规范
+## 五、shotDesc 写法规范

-### 6.1 语言
+### 5.1 语言

 统一英文输出。shotDesc 是下游图片模型的内容底稿，英文输入更稳定。视频提示词的语言由下游模块根据目标模型自动适配。

-### 6.2 必须包含的内容维度
+### 5.2 必须包含的内容维度

 **图文成片模式：**

@@ -259,7 +241,7 @@ The balance of power broke the same moment the geometry did.
 | 隐性动势 | 画面中隐含的运动趋势（**必填**） |
 | 情绪张力 | 用视觉词而非情绪词传递张力 |

-### 6.3 隐性动势（Implied Motion）
+### 5.3 隐性动势（Implied Motion）

 **视频成片模式：每条 shotDesc 必须包含至少一个隐性动势词组。**
 **图文成片模式：不强制，可选用以增加画面叙事感。**
@@ -287,7 +269,7 @@ the symmetry of the empty table stretching to both edges
 a man holding a cup and looking down
 ```

-### 6.4 隐性动势词库
+### 5.4 隐性动势词库

 **人物动势：**

@@ -315,12 +297,12 @@ breaks / silence stretching thin / the moment before something that cannot be
 undone
 ```

-### 6.5 字数控制
+### 5.5 字数控制

 - **图文成片：** 每条 shotDesc **50–80 词**——图片即成品，需要充分描述构图、氛围和视觉隐喻
 - **视频成片：** 每条 shotDesc **30–60 词**——视频模型需要精炼聚焦的运动指令，过长会稀释动势信号

-### 6.6 禁止事项
+### 5.6 禁止事项

 - 禁止写镜头运动参数（`zoom-in` / `pan`）——留给视频提示词
 - 禁止写色调参数（`cold blue` / `warm orange`）——留给图片提示词
@@ -329,7 +311,7 @@ undone
 - **图文成片：** 禁止连续两张同景别/同构图的 shot
 - **禁止剧透**：不能提前使用文案后续才出现的具体意象、物件、动作

-### 6.7 语义-画面对齐规则（剧透、铺垫与承接）
+### 5.7 语义-画面对齐规则（剧透、铺垫与承接）

 **三定律**：
 - **禁止剧透**：不能提前使用文案后续才出现的具体意象、物件、动作
@@ -366,19 +348,22 @@ between the two figures" ✅ 承接

 **检查方法**：每条 shotDesc 写完后，只看当前 script + shotDesc——画面内容是否只来自当前这段文案？如果不是，重写。

-## 七、directorRef 选择规则
+## 六、directorRef 选择规则

-| 选 Tarantino | 选 Kitano | 选 Fincher |
-|-------------|-----------|-----------|
-| 需要身体局部特写 | 需要大面积留白和静止感 | 需要精确控制感和对称破坏 |
-| 对话/博弈场景 | 孤独/等待/收尾场景 | 规律揭示/解剖者视角场景 |
-| 日常物件暗藏张力 | 空镜、余韵 | 审讯感、不可逃脱 |
+**每个分镜方案统一使用一位导演**，所有 Shot 的 directorRef 保持一致。在生成分镜前，根据文案整体气质选定一位导演，贯穿始终。

-**模式倾向：**
- **视频成片**优先 Tarantino（微行为动势强）、Fincher（细节暗示运动）
- **图文成片**优先 Kitano（留白冲击力强）、Fincher（构图控制精确）
+| 导演 | 适合的文案气质 |
+|------|-------------|
+| Tarantino | 微行为解码、潜台词密集、身体局部叙事、张力积压 |
+| Kitano | 孤独、等待、沉默中的对峙、留白冲击、收尾余韵 |
+| Fincher | 规律揭示、拆解者视角、对称破坏、审讯感、不可逃脱的压迫 |

-## 八、输入规范
+**选择依据：**
+- 通读全文后，判断文案整体最贴近哪种气质，选定一位导演
+- 如文案气质混合，选占比最高的那位
+- 选定后所有 Shot 统一使用，不中途切换
+
+## 七、输入规范

 ```
 【完整口播文案】
@@ -388,7 +373,7 @@ between the two figures" ✅ 承接
 图文成片 / 视频成片
 ```

-## 九、输出格式
+## 八、输出格式

 输出前附加总览行：

--- a/accounts/军事账号/prompts/分镜.md
+++ b/accounts/军事账号/prompts/分镜.md
@@ -154,41 +154,11 @@ geometry. The balance of power broke the same moment
 the geometry did.
 ```

-## 六、3秒钩子规则（Shot 1 强制）
-
-短视频前 3 秒决定用户是否留下。**Shot 1 必须是钩子，不是铺垫。**
-
-### 钩子策略
-
-| 策略 | 说明 | 示例 |
-|------|------|------|
-| **结论前置** | 从原文结尾或核心金句中提取最具冲击力的结论，直接放在开头 | "你混得不好，不是因为你太善良。" |
-| **认知冲突** | 一句话打破用户常识，制造"凭什么这么说"的好奇心 | "这个世界不奖励好人，也不惩罚坏人。" |
-| **身份挑衅** | 直接点中目标受众的身份焦虑 | "你把80%的认知带宽，花在了管理别人对你的评价上。" |
-
-### 钩子 shotDesc 规范
-
- **画面必须有视觉冲击力**：不用背影、走廊等铺垫画面；用裂开的盾牌、燃烧的铁器、破碎的对称等"破坏感"画面
- **构图禁止大面积留白**：留白是铺垫用的，钩子要"满"或"炸"
- **文案 ≤ 20 字**：一句话讲完，不留悬念尾巴
- **时长 4-5 秒**：钩子要短、狠、快
-
-### 禁止的钩子写法
-
- "大多数人..."、"你有没有想过..." — 设问式开头太慢
- 纯铺垫画面（空走廊、远背影）— 3 秒内没有视觉锚点
- 超过 20 字的钩子文案 — 用户来不及看完就划走了
-
-### 钩子之后的 Shot 2
-
-钩子说完冲击性结论后，Shot 2 负责"收回来"引入正文：
-> Shot 1（钩子）："你混得不好，不是因为太善良。" → Shot 2（引入）："为什么？让我拆给你看。" → Shot 3 起按原文顺序展开
-
-## 七、切割规则
+## 六、切割规则

 切割分两层：第一层按语义场景做宏观切分（两种模式通用），第二层按气口做微观切分（视频成片专用）。

-### 7.1 第一层：语义场景切割（两种模式通用）
+### 6.1 第一层：语义场景切割（两种模式通用）

 以语义场景转折为切割依据，不按句号机械切割。

@@ -200,7 +170,7 @@ the geometry did.
 | 节奏重音 | 强调句、停顿感强、关键意象出现 |
 | 语义完整（仅图文） | 该段表达一个完整观点或例子 |

-### 7.2 第二层：气口切割（视频成片专用）
+### 6.2 第二层：气口切割（视频成片专用）

 **视频成片在完成语义场景切割后，必须在每个语义场景内部进行第二轮气口切割。**

@@ -241,14 +211,14 @@ the geometry did.
 - ❌ 丢弃原文的论证、例子、细节来"节省字数"
 - ❌ 跨语义场景合并——气口切割只在同一个语义场景内部进行

-### 7.3 字数上限速查
+### 6.3 字数上限速查

 | 模式 | 每段字数 | 说明 |
 |------|---------|------|
 | 图文成片 | 50 字左右 | 一帧讲透一个观点 |
 | 视频成片 | 8–22 字 | 气口自然长度，长句必须拆为连续 Shot |

-### 7.4 时长控制
+### 6.4 时长控制

 - **图文成片：** 每条 Shot 4-10 秒，跟随旁白节奏，完整表达一个观点
 - **视频成片：** 每条 Shot 3-7 秒，目标 5 秒，匹配视频片段长度
@@ -402,18 +372,18 @@ an unseen object — darkness conceals what passes between the two figures"

 ## 八、directorRef 选择规则

-每个 Shot 根据旁白语义和画面特征选定一位导演：
+**每个分镜方案统一使用一位导演**，所有 Shot 的 directorRef 保持一致。在生成分镜前，根据文案整体气质选定一位导演，贯穿始终。

-| 选 Tarantino | 选 Kitano | 选 Fincher |
-|-------------|-----------|------------|
-| 需要身体局部特写 | 需要大面积留白和静止感 | 需要精确控制感和对称破坏 |
-| 对话权力博弈场景 | 孤独、边缘化、等待场景 | 规律揭示、人性解剖视角 |
-| 日常物件暗藏张力 | 空镜、余韵、收尾 | 审讯感、不可逃脱的压迫 |
-| 旁白有「潜台词解码」结构 | 旁白有「沉默」「位置」「等待」 | 旁白有「逐帧拆」「拆解者视角」 |
+| 导演 | 适合的文案气质 |
+|------|-------------|
+| Tarantino | 微行为解码、潜台词密集、身体局部叙事、张力积压 |
+| Kitano | 孤独、等待、沉默中的对峙、留白冲击、收尾余韵 |
+| Fincher | 规律揭示、拆解者视角、对称破坏、审讯感、不可逃脱的压迫 |

-**模式倾向：**
- **视频成片**优先 Tarantino（微行为动势强）、Fincher（细节暗示运动）
- **图文成片**优先 Kitano（留白冲击力强）、Fincher（构图控制精确）
+**选择依据：**
+- 通读全文后，判断文案整体最贴近哪种气质，选定一位导演
+- 如文案气质混合，选占比最高的那位
+- 选定后所有 Shot 统一使用，不中途切换

 ## 九、输入规范