feat(capcut-pipeline): 将 TTS 配音切换为 CosyVoice 并重构动画系统

- 将 TTS 引擎从 Qwen-TTS 切换为阿里云 CosyVoice（DashScope WebSocket） - 输出格式从 WAV（24kHz）改为 MP3 - 重构图片动画分拆逻辑，支持组合动画（如"渐显+放大"） - 移除字幕关键词高亮相关字段 - 移除已删除的 `uploadAudioToOSS` 函数，统一使用 `uploadToOSS` - 更新文档和配置默认值以匹配新引擎
2026-05-01 14:50:50 +08:00
parent 9d19437a29
commit 3a641244a5
5 changed files with 46 additions and 82 deletions
--- a/.claude/skills/capcut/SKILL.md
+++ b/.claude/skills/capcut/SKILL.md
@@ -19,7 +19,7 @@ description: 剪映/CapCut 自动化。通过 CapCut Mate API 实现草稿创建

 ```
 1. npm 依赖            → cd .claude/skills/video-from-script/scripts && npm install
-2. TTS 配音            → 阿里云 Qwen-TTS（config.json 中配置 ttsApiKey）
+2. TTS 配音            → 阿里云 CosyVoice TTS（config.json 中配置 ttsApiKey）
 ```

 ---
@@ -81,7 +81,7 @@ digraph capcut_assembly {

  input [label="素材 + manifest.json", shape=folder, fillcolor="#e3f2fd"]

-  step1 [label="1. TTS 配音（可选）\nnode qwen-tts.js\n→ WAV + 时长"]
+  step1 [label="1. TTS 配音（可选）\ncosyvoice → MP3 + 时长"]
  step2 [label="2. 上传图片到 OSS\n本地图片 → 公网 URL"]
  step3 [label="3. 创建草稿\ncreate_draft\n→ draft_url"]
  step4 [label="4. 导入素材+音频+字幕+特效\nadd_images / add_videos\nadd_audios / add_captions\nadd_effects"]
@@ -146,15 +146,12 @@ digraph capcut_assembly {

 ## TTS 配音（成片模式用）

-使用阿里云 Qwen-TTS（Node.js），替代原 Edge-TTS。
+使用阿里云 CosyVoice TTS（通过 DashScope WebSocket），Pipeline 自动调用。

-```bash
-# 准备输入
-echo '{"segments":[{"id":1,"text":"文案"}],"voice":"Cherry","output_dir":"./audio"}' > input.json
-
-# 生成
-node .claude/skills/video-from-script/scripts/qwen-tts.js input.json
-# → stdout: {"segments":[{"id":1,"audio":"./audio/seg_001.wav","duration":3.456}]}
+```js
+// 作为模块调用
+const { synthesize } = require('./qwen-tts')
+const { filePath, duration } = await synthesize('你好世界', { voice: 'Cherry' })
 ```

 配置在 `skills/config.json`：`ttsApiKey`（必填）、`ttsModel`、`ttsVoice`、`ttsLanguage`。
@@ -195,19 +192,19 @@ node .claude/skills/video-from-script/scripts/qwen-tts.js input.json

 ## 图片动画预设

-| 动画 | 说明 | 适用 |
+| 动画名称 | 说明 | 适用 |
 |------|------|------|
-| Ken Burns (zoom-in) | 1.0→1.1 缓慢放大 | 默认 |
-| Ken Burns (pan-left) | 右→左平移 | 风景 |
-| Ken Burns (pan-right) | 左→右平移 | 风景 |
-| 缩放弹出 | 0.8→1.0 | 强调 |
+| 缩放 | 缓慢放大（默认） | 通用 |
+| 渐显+放大 | 淡入+放大组合 | 强调 |
+| 左平移 | 右→左平移 | 风景 |
+| 右平移 | 左→右平移 | 风景 |

 ---

 ## 质量要求

- 字幕与文案对应正确，关键词高亮醒目
- 图片动画流畅（Ken Burns 幅度 1.0→1.1）
+- 字幕与文案对应正确
+- 图片动画流畅
 - BGM 音量不盖过配音（配音为主、BGM 为辅）
 - 转场自然（无黑帧、无跳帧）
 - 底部字幕区不被遮挡
--- a/.claude/skills/capcut/references/assembly-guide.md
+++ b/.claude/skills/capcut/references/assembly-guide.md
@@ -12,7 +12,7 @@
 1. CapCut Mate API 可达 → curl {config.capcutMateApiBase}/../docs
   - 部署在 capcut.muyetools.cn（配置在 skills/config.json）
 2. npm 依赖            → cd scripts && npm install
-3. TTS 配音            → 阿里云 Qwen-TTS（配置在 config.json 的 ttsApiKey）
+3. TTS 配音            → 阿里云 CosyVoice TTS（配置在 config.json 的 ttsApiKey）
 4. 同步到本地剪映       → 纯 Node.js（sync-to-jianying.js），无需 Python/uv
 ```

@@ -57,7 +57,7 @@ digraph assembly_flow {
  node [shape=box, style=filled, fillcolor="#f5f5f5", fontsize=11]

  input [label="素材 + manifest.json", shape=folder, fillcolor="#e3f2fd"]
-  step1 [label="1. TTS 配音（可选）\nnode qwen-tts.js\n→ WAV + 时长"]
+  step1 [label="1. TTS 配音（可选）\ncosyvoice → MP3 + 时长"]
  step2 [label="2. 上传图片到 OSS\n本地图片 → 公网 URL"]
  step3 [label="3. 创建草稿\ncreate_draft → draft_url"]
  step4 [label="4. 导入素材+音频+字幕+特效"]
@@ -75,15 +75,12 @@ digraph assembly_flow {

 ### 1. TTS 配音（可选）

-使用阿里云 Qwen-TTS 进行语音合成（Node.js，无需 Python）。
+使用阿里云 CosyVoice TTS 进行语音合成（通过 DashScope WebSocket，Node.js）。

-```bash
-# 准备输入 JSON
-echo '{"segments":[{"id":1,"text":"第一段文案"},{"id":2,"text":"第二段文案"}],"voice":"Cherry","output_dir":"./audio"}' > input.json
-
-# 批量生成
-node scripts/qwen-tts.js input.json
-# → stdout: {"segments":[{"id":1,"text":"...","audio":"./audio/seg_001.wav","duration":3.456}]}
+```js
+const { synthesize } = require('./qwen-tts')
+const { filePath, duration } = await synthesize('文案', { voice: 'Neil', outputDir: './audio' })
+// → ./audio/seg_001.mp3, duration: 3.456
 ```

 配置在 `skills/config.json`：
@@ -91,8 +88,8 @@ node scripts/qwen-tts.js input.json
 | 字段 | 说明 | 默认值 |
 |------|------|--------|
 | `ttsApiKey` | 阿里云百炼 API Key | （必填） |
-| `ttsModel` | 模型名称 | `qwen-tts` |
-| `ttsVoice` | 音色名称 | `Cherry` |
+| `ttsModel` | 模型名称 | `cosyvoice-v3.5-plus` |
+| `ttsVoice` | 音色名称 | 账号配置 |
 | `ttsLanguage` | 语言类型 | `Chinese` |

 推荐音色：
@@ -108,7 +105,7 @@ node scripts/qwen-tts.js input.json
 | `Neil` | 阿闻 | 新闻主持人 | 新闻、财经 |
 | `Bellona` | 燕铮莺 | 洪亮有力女声 | 热血、武侠 |

-所有音色均支持中英文，输出 WAV 格式（24kHz），URL 有效期 24 小时。
+所有音色均支持中英文，输出 MP3 格式（24kHz）。

 **作为模块调用**：

@@ -151,7 +148,7 @@ POST /create_draft  { width: 1080, height: 1920 }

 ```
 POST /add_images
-每张图片 3-5 秒，附带 Ken Burns 动画（缩放 1.0→1.1）
+每张图片 3-5 秒，附带动画（默认缩放）
 ```

 **视频模式** (`--mode videos`):
@@ -173,8 +170,7 @@ POST /add_audios

 ```
 POST /add_captions
- 文案来自 manifest.json
- 关键词高亮（account.json 中 subtitleStyle.highlightColor）
+- 文案来自 manifest.json（TTS 分句时按 segment 精确对齐）
 - 字体大小、颜色从账号配置读取
 ```

@@ -212,19 +208,18 @@ add_videos 提交 9+ 视频时可能触发网关 504。脚本自动降级：

 ## 图片动画预设

-| 动画类型 | 说明 | 适用场景 |
+| 动画名称 | 说明 | 适用场景 |
 |---------|------|---------|
-| Ken Burns (zoom-in) | 从 1.0 缓慢放大到 1.1 | 默认，适合大多数场景 |
-| Ken Burns (pan-left) | 画面从右向左平移 | 风景、全景 |
-| Ken Burns (pan-right) | 画面从左向右平移 | 风景、全景 |
-| 缩放弹出 | 从 0.8 弹到 1.0 | 强调、冲击感 |
+| 缩放 | 缓慢放大 | 默认，适合大多数场景 |
+| 渐显+放大 | 淡入+放大组合 | 强调、冲击感 |
+| 左平移 | 右→左平移 | 风景、全景 |
+| 右平移 | 左→右平移 | 风景、全景 |

 ---

 ## 质量检查

 - [ ] 字幕与文案对应正确
- [ ] 关键词高亮颜色醒目
 - [ ] 图片动画流畅（无卡顿）
 - [ ] BGM 音量与配音平衡
 - [ ] 转场自然（无黑帧）
--- a/.claude/skills/video-from-script/SKILL.md
+++ b/.claude/skills/video-from-script/SKILL.md
@@ -334,10 +334,9 @@ node kling-video-generator.js --image <url> --prompt <prompt> -o ./videos
 ```
 output/{name}_{YYYYMMDD}_{NNN}/
 ├── manifest.json                # 主清单（贯穿全流程）
-├── prompts.txt                  # 原始提示词存档
 ├── images/                      # scene_{NN}_{slug}.jpeg（slug 从 script/shotDesc 派生，首尾帧加 _last 后缀）
 ├── videos/                      # scene_{NN}_{slug}.mp4（与图片对应）
-└── urls.json                    # OSS 公网 URL 映射
+└── audio/                       # seg_001.mp3（TTS 分句音频，多句时 seg_{id}_{j}.mp3）
 ```

 **命名对应关系**：图片 `scene_01_觉醒.jpeg` → 视频 `scene_01_觉醒.mp4`；首尾帧尾帧 `scene_01_觉醒_last.jpeg`；MJ 候选 `scene_01_觉醒_cand1.jpeg`
@@ -396,7 +395,7 @@ output/{name}_{YYYYMMDD}_{NNN}/

 所有子技能共享以下资源（位于本目录）：

- `scripts/` — 共享脚本（gemini-image-generator.js, mj-image-generator.js, grok-video-generator.js, veo-video-generator.js, capcut_assemble.js, sync-to-jianying.js, oss-upload.js）
+- `scripts/` — 共享脚本（gemini-image-generator.js, mj-image-generator.js, grok-video-generator.js, veo-video-generator.js, kling-video-generator.js, qwen-tts.js, capcut_assemble.js, sync-to-jianying.js, oss-upload.js）
 - `accounts/` — 账号配置（项目根目录，详见 [account-system.md](references/account-system.md)）
 - `references/account-system.md` — 账号系统说明

--- a/.claude/skills/video-from-script/scripts/capcut_assemble.js
+++ b/.claude/skills/video-from-script/scripts/capcut_assemble.js
@@ -218,8 +218,7 @@ async function assemble(args) {
    format = '9:16',
    apiKey = '',
    duration = '4',
-    animation = '缩放',
-    localAudio = 'true',
+    animation = '渐显+放大',
  } = args

  if (!input) throw new Error('缺少 --input 参数')
@@ -352,12 +351,11 @@ async function assemble(args) {
    // Step 2: 上传（已调速的）视频到 OSS
    const missingUrl = items.filter(it => it.video && !it.videoUrl)
    if (missingUrl.length > 0) {
-      const { uploadFile } = require('./oss-upload')
      console.log(`  上传 ${missingUrl.length} 个视频到 OSS...`)
      for (const item of missingUrl) {
        const videoPath = path.resolve(inputDir, item.video)
        try {
-          const { url } = await uploadFile(videoPath)
+          const url = await uploadToOSS(videoPath)
          item.videoUrl = url
          // 回写 manifest
          if (manifestFile) {
@@ -492,17 +490,12 @@ async function addImages(draftUrl, items, imgUrls, timeline, width, height, anim
    }

    if (animation) {
-      const parts = animation.split('+')
-      for (const part of parts) {
-        const name = part.trim()
-        // 组合动画（持续整段）：缩放、三分割 等
-        if (name === '缩放' || name === '缩放 II') {
-          info.loop_animation = name
-        } else {
-          // 默认作为入场动画
-          info.in_animation = name
-        }
-      }
+      const parts = animation.split('+').map(p => p.trim()).filter(Boolean)
+      const groupNames = ['缩放', '缩放 II']
+      const groupAnims = parts.filter(p => groupNames.includes(p))
+      const inAnims = parts.filter(p => !groupNames.includes(p))
+      if (groupAnims.length > 0) info.loop_animation = groupAnims.join('|')
+      if (inAnims.length > 0) info.in_animation = inAnims.join('|')
    }

    return info
@@ -637,19 +630,9 @@ async function addVideos(draftUrl, inputDir, items, timeline, width, height, tra
 }

 // ============================================================================
-// 音频上传（本地文件 → OSS 公网 URL）
+// 音频批量上传（本地文件 → OSS 公网 URL）
 // ============================================================================

-async function uploadAudioToOSS(filePath) {
-  try {
-    const oss = require(path.join(__dirname, 'oss-upload'))
-    const { url } = await oss.uploadFile(filePath)
-    return url
-  } catch (err) {
-    throw new Error(`音频上传 OSS 失败: ${err.message}`)
-  }
-}
-
 async function batchUploadAudio(inputDir, items) {
  const urls = {}
  for (const item of items) {
@@ -665,7 +648,7 @@ async function batchUploadAudio(inputDir, items) {
          continue
        }
        try {
-          urls[seg.audio] = await uploadAudioToOSS(filePath)
+          urls[seg.audio] = await uploadToOSS(filePath)
          console.log(`   上传: ${path.basename(filePath)} -> OK`)
        } catch (err) {
          console.error(`   上传失败: ${path.basename(filePath)} - ${err.message}`)
@@ -686,7 +669,7 @@ async function batchUploadAudio(inputDir, items) {
      continue
    }
    try {
-      urls[item.audio] = await uploadAudioToOSS(filePath)
+      urls[item.audio] = await uploadToOSS(filePath)
      console.log(`   上传: ${path.basename(filePath)} -> OK`)
    } catch (err) {
      console.error(`   上传失败: ${path.basename(filePath)} - ${err.message}`)
@@ -868,8 +851,6 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
            start: currentTime,
            end: currentTime + duration,
            text: seg.text,
-            keyword: '',
-            keyword_color: '',
          }

          if (inAnimation) cap.in_animation = inAnimation
@@ -903,8 +884,6 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
            start: currentTime,
            end: currentTime + duration,
            text: sentence,
-            keyword: '',
-            keyword_color: '',
          }

          if (inAnimation) cap.in_animation = inAnimation
@@ -918,16 +897,10 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
      }
    } else {
      // 原始模式：一句字幕
-      const keyword = ''
-      const keywordColor = style.highlightColor || style.color || '#FFFFFF'
-
      const cap = {
        start: tl.start,
        end: tl.end,
        text,
-        keyword,
-        keyword_color: keyword ? keywordColor : '',
-        keyword_font_size: 18,
      }

      if (inAnimation) cap.in_animation = inAnimation
@@ -1040,7 +1013,7 @@ async function main() {
    console.log('  --duration 4             默认每段时长/秒（无TTS时的fallback，默认 4）')
    console.log('  --voiceover true|false   是否添加TTS配音轨道（默认 true）')
    console.log('  --subtitles true|false   是否添加字幕（默认 true）')
-    console.log('  --split-captions true|false  分句字幕模式（默认 false，长句按标点切分）')
+    console.log('  --split-captions true|false  分句字幕模式（默认 true，按标点切分）')
    console.log('  --bgm <url>              背景音乐 URL')
    console.log('  --effects "名称1,名称2"  特效名称（逗号分隔）')
    console.log('  --filter "名称:强度"     滤镜（强度 0-100）')
--- a/.claude/skills/video-from-script/scripts/lib/phase-assemble.js
+++ b/.claude/skills/video-from-script/scripts/lib/phase-assemble.js
@@ -23,7 +23,7 @@ async function phaseAssemble(manifest, manifestPath, options) {
    subtitles: mode === 'images' ? 'true' : 'false',
    voiceover: manifest.items.some(it => it.audio) ? 'true' : 'false',
    duration: '4',
-    animation: capcutConfig.animation || '缩放',
+    animation: capcutConfig.animation || '渐显+放大',
  }

  if (capcutConfig.defaultBGM) assembleArgs.bgm = capcutConfig.defaultBGM