feat(skills): 完善视频生产 pipeline 及新增健身跟练账号

- SKILL.md: 新增工作流阶段定义、质量卡点、分镜规则
- manifest-schema.md: 补充完整字段规范及类型定义
- phase-tts.js: 优化 TTS 合成长逻辑,添加进度追踪
- capcut-tracks.js: 扩展轨道构建能力,支持更多元素类型
- capcut-timeline.js: 改进时间线生成,支持淡入淡出
- capcut_assemble.js: 新增 assemble 阶段完整实现
- cmd-init.js: 完善 init 命令逻辑
- qwen-tts.js: 调整超时配置
- accounts/禁忌帝王学: 更新拆分/图像/台词提示词
- accounts/健身跟练: 新增账号含 account.json 及全套提示词模板
- 新增 workflow-issues-20260501.md 参考文档

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
lc
2026-05-06 22:53:37 +08:00
parent e6daf7a8d8
commit 6eec0e8889
28 changed files with 2199 additions and 253 deletions

View File

@@ -1,5 +1,5 @@
{
"jianyingDraftPath": "C:/Users/45070/AppData/Local/JianyingPro/User Data/Projects/com.lveditor.draft",
"jianyingDraftPath": "/Users/lc/Movies/JianyingPro/User Data/Projects/com.lveditor.draft",
"capcutMateDir": "C:/Users/45070/capcut-mate",
"capcutMateApiBase": "http://capcut.muyetools.cn/openapi/capcut-mate/v1",
"imgbbApiKey": "deprecated",

View File

@@ -78,6 +78,14 @@ node .claude/skills/video-from-script/scripts/get-template-path.js --account <
```
输出示例:`accounts\军事账号\prompts\分镜.md`
**时间线核心规则(固化铁律,必须透传给分镜子 Agent**
- 文案是时间轴唯一锚点:总音频时长 = 文案总字数 ÷ 5语速 5字/秒)
- Kling 可灵视频片段固定 6 秒
- **每个 shot 的 TTS 估算(= script字数÷5必须 ≤ 6 秒**
- TTS > 6s → 强制在语义断点处拆分,拆分后每段 script = 语义子句(原句切分),**禁止用完整句重复填充多 shot**
- 合并后 script 拼接 = 原句一字不差
- audioDur > videoDur × 212s的 shot 禁止合并,必须拆分
**子 Agent prompt 必须包含**
1. `模板文件绝对路径:{get-template-path.js 输出的路径,转为绝对路径}`,并指示子 Agent "先 Read 此文件全文,严格按模板规则执行"
2. 用户完整口播文案
@@ -87,10 +95,28 @@ node .claude/skills/video-from-script/scripts/get-template-path.js --account <
**禁止**:主 Agent 不得摘要模板内容传给子 Agent必须让子 Agent 直接读文件。
```json
[{"id":1,"shotDesc":"英文画面描述","script":"中文口播文案","duration":5,"directorRef":"tarantino","keyword":"权力"}]
[{"id":1,"shotDesc":"英文画面描述","script":"中文口播文案","duration":"TTS估算=字数÷5","directorRef":"fincher"}]
```
**主 Agent 审查**时长合理隐性动势完整视频模式directorRef 已填?视频模式 script 拼接校验通过?
**主 Agent 审查(时间线合规优先):**
1. 每个 shot 的 TTS 估算是否 ≤ 6s→ 超过必须打回
2. TTS > 6s 的 shot 是否已正确拆分script 是否为语义子句?
3. 时长合理隐性动势完整视频模式directorRef 已填?
4. 视频模式 script 拼接校验通过(一字不差)?
**分镜规划展示(每次必须输出此表):**
```
文案: [原文]
总字数: N字 | 总音频估算: X.Xs | 视频片段: N个 | 视频模型固定时长: Kling=6s
TTS语速: 1.15x(写死在 qwen-tts.js| 音频调速: 禁止 | 视频适配: 加速/截断
| # | TTS估算 | script字数| ratio估算 | 策略 | 拆分说明 | directorRef |
|---|---------|------------|---------|------|---------|------------|
| 1 | 4.8s | 你只有看清了...24字| 6/17.5=0.34 | ⚠禁止 | TTS>6s需拆 | fincher |
| 2 | 5.2s | 其实这是...26字| 6/27=0.22 | ⚠禁止 | TTS>6s需拆 | fincher |
...
```
ratio = estimatedVideoDuration(6s) / estimatedAudioDuration字数÷5
→ 展示给用户确认。确认后**分镜表锁定为脊骨契约**,下游禁止增减 shot。
@@ -197,7 +223,9 @@ node scripts/pipeline.js confirm --manifest <path> --all
node scripts/pipeline.js status --manifest <path>
```
**阶段顺序**: `images``upload``videos` `tts` `assemble`
**阶段顺序**: `tts``images``upload``videos``assemble`
> **流程变更**TTS 提前到图片生成之前确保音频时长在生图前已知ratio 可控。
**Item 状态**: `pending``generating``done` / `failed`
@@ -207,6 +235,36 @@ node scripts/pipeline.js status --manifest <path>
每阶段完成后主 Agent 自动校验,不通过则修复后继续。
### 分镜规划展示(每次必须输出)
```
文案: [原文]
总字数: N字 | 总音频估算: X.Xs | 视频片段: N个 | 视频模型: Kling=6s
| # | TTS估算 | script内容字数| 拆分说明 | audioDur估算→真实 | ratio |
|---|---------|----------------|---------|------------------|------|
| 1 | 4.8s | 你只有看清了...24字| ✅未超6s | 4.8→17.5s | 0.34⚠ |
| 2 | 5.2s | 其实这是...26字| ✅未超6s | 5.2→27.0s | 0.22⚠ |
...
```
ratio = estimatedVideoDuration / realAudioDuration⚠=需拆分
### 分镜质量卡点(固化)
| 检查项 | 标准 | 不通过则 |
|--------|------|---------|
| 单 shot TTS 估算 | duration ≤ estimatedVideoDurationKling=6s | **强制打回分镜阶段拆分** |
| 长句处理 | TTS>6s → 语义子句拆分script 不重复完整句 | 打回重写 |
| script 内容 | 每个 shot 的 script 是独立语义子句 | 打回重写 |
| 合并校验 | 所有 script 拼接 = 原文一字不差 | 打回重写 |
| ratio 预判 | estimatedVideoDuration / estimatedAudioDuration < 0.9 → 禁止 | 打回分镜重切 |
| 视频模型时长 | estimatedVideoDuration 已填入 manifest | 检查 pipeline init |
**assemble 阶段铁律:**
- 音频以 1.15x 原速导入 CapCut无 speed 字段
- 视频只允许加速speed_up或截断trim
- **禁止慢放slow_down/ 冻结帧freeze/ 音频调速**
### 生图
| 检查项 | 标准 | 不通过 |

View File

@@ -0,0 +1,12 @@
[
{"id": 1, "shotDesc": "A man stands at a crossroad frozen in place, four massive mirrors orbiting him at different angles simultaneously, each reflecting a distorted opposite version of himself — one reading while thinking, one reaching for freedom while chained, one pursuing love without resources, one chasing wealth without action. Fincher cold blue directional light cuts through the scene with architectural shadow lines on the ground. Urban modern fashion.", "script": "99%的人都没意识到的四个致命现象", "duration": 3.8, "directorRef": "fincher"},
{"id": 2, "shotDesc": "A stylish urban figure is split in half by a hard vertical shadow line — left side leans in with an open book, right side rests a hand on a thinking forehead. Twin halves pull in opposite directions, fabric and posture straining against each other. Fincher cold blue practical light, high contrast chiaroscuro.", "script": "不读书却爱思考,不独立却想要自由,没物质却想谈真爱,没执行力却想要发财", "duration": 4.0, "directorRef": "fincher"},
{"id": 3, "shotDesc": "A suited man kicks an old broken clock lying on the ground, its pendulum snapped, yet the clock face still shows hands spinning uncontrollably. The scene freezes mid-kick. Fincher sharp cold practical light, strong shadow, dramatic tension.", "script": "很多事做错了能重来", "duration": 3.0, "directorRef": "fincher"},
{"id": 4, "shotDesc": "A man stands with a black sandbag crushing his shoulders into the ground, four thick chains extending from the bag — each chain wrapped tight around a different part of his body. He strains but cannot break free. Fincher cold blue light, urban background, precise shadow architecture.", "script": "但这四个坑只要踩中一个", "duration": 3.0, "directorRef": "fincher"},
{"id": 5, "shotDesc": "A man running on a treadmill that moves faster and faster beneath him, but the surrounding walls close in with each stride — the space shrinking relentlessly. His expression shifts from determination to desperation. Fincher cold blue overhead light, architectural shadow lines on walls, urban setting.", "script": "你这辈子注定越折腾越穷,越懂事越惨", "duration": 3.4, "directorRef": "fincher"},
{"id": 6, "shotDesc": "A man freezes mid-step on a bridge over dark water, looking down at the vast emptiness below, hands open in the realization. Fincher cold blue natural light, negative space composition, silent tension.", "script": "这不是吓唬你", "duration": 1.0, "directorRef": "fincher"},
{"id": 7, "shotDesc": "A person of any age walks through a dark room full of four suspended iron traps above, each glowing faintly with red warning light overhead. The person looks up, scanning every trap with alert focused eyes. Fincher calculated cold blue overhead practical light, architectural shadows, urban modern fashion.", "script": "一个人无论身处任何年龄阶段都必须提防这四大陷阱", "duration": 4.0, "directorRef": "fincher"},
{"id": 8, "shotDesc": "A person stops at the entrance of four diverging corridors, each corridor is a different trap — fire, ice, void, thorns. The person observes and calculates, then reaches for one. Fincher sharp cold blue light, precise shadow edges, urban modern fashion, strong visual anchor.", "script": "今天这条视频就带你拆解清楚", "duration": 1.2, "directorRef": "fincher"},
{"id": 9, "shotDesc": "A spider diagram fills the frame — four different trap icons on four sides of the screen all connect to a single glowing core node at the center. Lightning bolts from center to each icon, showing they are secretly linked. Fincher cold blue analytical lighting, precise architectural lines, the core node pulses.", "script": "这四个看似互不关联,底层逻辑却互通的陷阱", "duration": 4.8, "directorRef": "fincher"},
{"id": 10, "shotDesc": "A stylish urban person walks through an open door into white light, the four iron traps behind them dissolve into smoke and shatter. Fincher cold blue backlight silhouette, the door is an analytical white bright light with negative space composition.", "script": "越早警惕,才能越早破解", "duration": 4.8, "directorRef": "fincher"}
]

View File

@@ -34,6 +34,7 @@ node scripts/pipeline.js validate --manifest <path>
| `imageModel` | `gemini` / `mj` | account.json | **init 自动** |
| `videoModel` | `veo3-fast-frames` / `grok-video-3` / `kling` 等 | account.json | **init 自动** |
| `format` | 画幅:`9:16` / `16:9` | account.json | **init 自动** |
| `estimatedVideoDuration` | 视频模型固定时长(秒),顶层冗余字段 | videoModel 查表 | **init 自动**assemble 直接读 |
| `mode` | `single` 单图 / `framePair` 首尾帧 | CLI 参数 | **init 自动** |
| `references` | 参考图数组,从 account.json styles.*.references 搬入 | account.json | **init 自动** |
| `items` | 素材数组AI 提供创意内容) | CLI --items | **AI → init** |
@@ -58,18 +59,29 @@ node scripts/pipeline.js validate --manifest <path>
|------|------|
| `status` | 固定写 `"pending"` |
| `shotDesc` | 英文分镜描述含隐性动势40-80词 |
| `script` | **该段的完整原文**不提炼,保留论证、例子、细节|
| `duration` | 计划视频时长(秒),来自分镜阶段 |
| `script` | **该 shot 的语义子句原文**完整句拆分后的子段,一字不差|
| `duration` | **TTS 估算秒数(= script字数÷5**,必须 ≤ 6s |
| `estimatedAudioDuration` | 同 duration备选别名 |
| `estimatedVideoDuration` | 视频模型固定时长Kling=6s, VEO=8s, Grok=6spipeline init 时自动填入 |
| `imagePrompt` | 英文画面描述(给 Gemini/MJStep 2-A 生成 |
| `directorRef` | 导演构图参考tarantino / kitano / fincher三层透传 |
| `keyword` | 关键字氛围词2-6 字assemble 时以花字效果叠加在画面中央。可选 |
| `confirmed` | 人工确认状态,默认 `false` |
**强制约束:**
- **每个 shot 的 `duration`TTS估算必须 ≤ 6s**,否则 pipeline 拒绝执行
- `script` 必须是语义子句,**完整句直接填入多个 shot 是严重错误**
- `estimatedVideoDuration` 在 manifest 初始化时由 `pipeline.js init` 从 videoModel 自动推算:
- `kling``6`
- `veo3-fast` / `veo3-fast-frames``8`
- `grok-video-3``6`
- assemble 阶段通过 `ratio = estimatedVideoDuration / realAudioDuration` 选择适配策略
### Agent 后续回写Step 3-A 视频提示词)
| 字段 | 说明 | 写入时机 |
|------|------|---------|
| `videoPrompt` | 英文运动描述(给 Grok/VEO描述镜头运动而非内容 | Step 3-A 由 Agent 回写 |
| `videoPrompt` | 英文运动描述(给 Grok/VEO/Kling),描述镜头运动而非内容 | Step 3-A 由 Agent 回写 |
### Pipeline 回写(执行后)
@@ -81,10 +93,10 @@ node scripts/pipeline.js validate --manifest <path>
| `url` | 图片 OSS 公网 URL | upload |
| `confirmed` | 人工确认后设为 `true` | confirm |
| `video` | 生成的视频路径 | videos |
| `videoDuration` | 视频时长(秒),Grok=6, VEO=8 | videos |
| `videoDuration` | 视频实测时长(秒),Kling=6, VEO=8, Grok=6 | videos |
| `videoUrl` | 视频 OSS 公网 URL | videos |
| `audio` | TTS 音频路径(多句时为合并后的完整音频) | tts |
| `audioDuration` | 音频时长(秒) | tts |
| `audio` | TTS 音频路径 | tts |
| `audioDuration` | 音频实测时长(秒) | tts |
| `segments` | 分句音频数组(仅多句时存在),见下方 | tts |
### Agent 审查时可操作
@@ -220,20 +232,42 @@ TTS 阶段统一生成,单句时数组仅 1 个元素,多句时 N 个元素
## 成片时间线规则
> **核心原则**
> - 文案是时间轴唯一锚点
> - TTS 语速固定 1.15x(写死在 qwen-tts.js音频导入 CapCut 时不可调速
> - **音频时长是主时间线**:每个 shot 的 TTS 估算必须 ≤ 视频模型固定时长
> - **视频必须 ≥ 音频**audioDur > videoDur 的 shot 在分镜阶段必须拆分,不允许慢放/冻结
### 时间线估算规则
| 字段 | 计算方式 | 来源 |
|------|---------|------|
| TTS 语速 | **固定 1.15x** | qwen-tts.js 参数 `rate: 1.15`,不可修改 |
| 单 shot TTS 估算 | `script.length ÷ 5`(字/秒) | AI 写入 duration 字段 |
| 视频模型固定时长 | Kling=6s, VEO=8s, Grok=6s | `pipeline.js init` 从 videoModel 推算 |
| ratio | `estimatedVideoDuration / estimatedAudioDuration` | 估算值,供分镜阶段检查 |
| ratio实测 | `videoDuration / audioDuration` | assemble 阶段真实值 |
### 图片模式images
图片没有独立时长。TTS 音频时长 = 画面时长。无 TTS 音频的 item 时长为 0跳过不显示
### 视频模式videos
TTS 音频为主轴,视频通过以下策略适配音频时长:
**铁律:视频片段必须 ≥ 音频片段。**
| ratio = videoDur/audioDur | 策略 | 说明 |
|---------------------------|------|------|
TTS 音频为主轴,视频通过以下策略适配音频实测时长:
| ratio = estimatedVideoDuration / estimatedAudioDuration | 策略 | 说明 |
|---------------------------------------------------|------|------|
| 0.9 ~ 1.1 | none | 接近匹配,无需调整 |
| > 1.1, ≤ 2 | speed_up | 加速setpts 压缩时间) |
| > 2 | trim | 裁剪(截断音频时长 |
| < 0.9, ≥ 0.5 | slow_down | 放缓setpts 拉长时间) |
| < 0.5 | freeze | 画面停顿(视频原速 + 最后一帧冻结补时长) |
| > 1.1, ≤ 2 | **speed_up**(最优) | 视频加速追上音频,音频速率不变 |
| > 2 | **trim**(次选) | 视频截断音频时长,损失尾部 |
| < 0.9 | **禁止 / 打回分镜** | audioDur > videoDur 的 shot 在分镜阶段必须拆分,不允许慢放/冻结 |
**禁止的策略(已删除):**
- `slow_down`:音频时长超过视频时不允许慢放
- `freeze`:不允许冻结帧补齐
- 音频调速CapCut 导入音频时无 speed 字段1.15x 速率固定
所有策略失败后兜底:截断到目标时长。

View File

@@ -0,0 +1,92 @@
# 图文成片工作流问题记录
## 问题一manifest 初始化缺少 file 字段
**现象:**
```
[assemble] 成片失败: The "path" argument must be of type string. Received undefined
```
**根因:**
`pipeline.js init` 生成的 manifest.json 中,每个 item 只有 `shotDesc``script``duration` 等字段,**缺少 `file` 字段**。
`capcut_assemble.js` 依赖 `item.file` 来定位本地图片:
```js
const filePath = path.join(inputDir, item.file)
return fs.existsSync(filePath)
```
没有 `file` 字段时,`path.join(inputDir, undefined)``undefined` → 报错。
**修复:**
手动给每个 item 补上 `file` 字段:
```js
m.items.forEach((item, i) => {
item.file = 'images/scene_' + String(i+1).padStart(2,'0') + '_' + slug + '.jpeg'
})
```
**建议改进:**
`pipeline.js init` 应自动根据 items 索引和 slug 生成 `file` 字段,与后续 assemble 阶段无缝衔接。
---
## 问题二:并行生图命令的 cwd 问题
**现象:**
```bash
# 6 个并行命令,其中 5 个报错
cd .claude/skills/video-from-script/scripts && node gemini-image-generator.js ...
# Exit code 1: no such file or directory
```
**根因:**
5 个并行命令在解析时zsh 把 `.claude/...` 路径中 `.` 视为当前目录的相对路径,而并行任务可能在不同 cwd 下执行,导致路径解析失败。
**修复:**
改用绝对路径:
```bash
SCRIPTS="/Users/lc/Desktop/CLAUDE/video-create/.claude/skills/video-from-script/scripts"
node "$SCRIPTS/gemini-image-generator.js" generate "..." -o "$OUT" -r 9:16
```
**建议改进:**
CLI 命令始终使用绝对路径,避免相对路径在并行环境下的歧义。
---
## 问题三shot 6 在 mv 重命名时被遗漏
**现象:**
6 张图生完后,重命名时只有 5 张被正确命名为 `scene_0X_xxx.jpeg`,缺少 `scene_06_跪着.jpeg`
**根因:**
mv 命令基于修改时间排序zsh glob `generated_*.jpeg` 只匹配当时存在的文件。6张图的生成时间戳不同重命名脚本从最旧的文件开始取tail -1但脚本顺序与实际时间顺序可能不匹配。
**修复:**
直接对生成的临时文件重命名为固定名称,不依赖时间排序逻辑。
**建议改进:**
pipeline.js 的生图阶段应直接输出为 `scene_{NN}_{slug}.jpeg`,而非先生成 `generated_*.jpeg` 再重命名。
---
## 问题四pipeline.js assemble 阶段的路径解析 Bug
**现象:**
`pipeline.js run --phase tts,assemble` 时 tts 正常,但 assemble 阶段找不到文件而报错。直接调用 `capcut_assemble.js --input <dir>` 则正常。
**根因:**
pipeline.js 在调用 assemble 时,传递的 input 路径为相对路径,且未正确设置 `item.file` 字段,导致 assemble 内部 `path.join(inputDir, item.file)` 得到 undefined。
**建议改进:**
`pipeline.js run --phase assemble` 前应先检查 items 是否都有 `file` 字段,缺失时自动补充。
---
## 建议的改进方向
1. **`pipeline.js init`** 自动生成 `file` 字段,与图片命名规范一致
2. **CLI 命令** 统一使用绝对路径,避免 cwd 歧义
3. **生图脚本** 直接输出为 `scene_XX_xxx.jpeg`,消除重命名步骤
4. **`pipeline.js validate`** 增加 assemble 阶段的前置检查items.file + items.audio 完整性)

View File

@@ -22,7 +22,7 @@ const { buildTimeline, adjustVideoSpeed } = require('./lib/capcut-timeline')
const {
loadAccountConfig, loadSubtitleStyle,
loadKenBurns, loadTransitions,
addImages, addVideos, addKenBurns,
addImages, addVideos, addSlotsLocally,
addVoiceover, addBGM,
addSubtitles,
addEffects, addFilter,
@@ -65,23 +65,43 @@ async function batchUploadToOSS(inputDir, files, concurrency = 3) {
async function batchUploadAudio(inputDir, items) {
const urls = {}
for (const item of items) {
if (!item.audio || item.audio.startsWith('http')) {
if (item.audio) urls[item.audio] = item.audio
continue
// 处理主音频
if (item.audio && !item.audio.startsWith('http')) {
if (!urls[item.audio]) {
const filePath = path.isAbsolute(item.audio)
? item.audio
: path.resolve(inputDir, item.audio)
if (fs.existsSync(filePath)) {
try {
urls[item.audio] = await uploadToOSS(filePath)
console.log(` 上传: ${path.basename(filePath)} -> OK`)
} catch (err) {
console.error(` 上传失败: ${path.basename(filePath)} - ${err.message}`)
}
}
}
} else if (item.audio) {
urls[item.audio] = item.audio
}
if (urls[item.audio]) continue
const filePath = path.isAbsolute(item.audio)
? item.audio
: path.resolve(inputDir, item.audio)
if (!fs.existsSync(filePath)) {
console.error(` 音频文件不存在: ${filePath}`)
continue
}
try {
urls[item.audio] = await uploadToOSS(filePath)
console.log(` 上传: ${path.basename(filePath)} -> OK`)
} catch (err) {
console.error(` 上传失败: ${path.basename(filePath)} - ${err.message}`)
// 处理分段音频
if (item.segments && item.segments.length > 0) {
for (const seg of item.segments) {
if (!seg.audio || seg.error) continue
if (urls[seg.audio]) continue
const filePath = path.isAbsolute(seg.audio)
? seg.audio
: path.resolve(inputDir, seg.audio)
if (!fs.existsSync(filePath)) {
console.error(` 音频文件不存在: ${filePath}`)
continue
}
try {
urls[seg.audio] = await uploadToOSS(filePath)
console.log(` 上传: ${path.basename(filePath)} -> OK`)
} catch (err) {
console.error(` 上传失败: ${path.basename(filePath)} - ${err.message}`)
}
}
}
}
return urls
@@ -288,7 +308,11 @@ async function assemble(args) {
}
}
}
await addVideos(draftUrl, inputDir, items, timeline, width, height, transitionConfig)
const segmentIds = await addVideos(draftUrl, inputDir, items, timeline, width, height, transitionConfig)
// 将 segment_ids 附加到 items供后续 addSlotsLocally 使用
if (segmentIds && segmentIds.length > 0) {
items.forEach((item, i) => { item._segmentId = segmentIds[i] || null })
}
}
// -- Ken Burns --
@@ -383,6 +407,13 @@ async function assemble(args) {
await syncToLocalJianying(draftUrl, draftId, totalDurationUs)
console.log(' 同步完成\n')
// -- 视频轨道 slot 写入(在 syncToLocalJianying 之后执行,此时本地草稿文件已存在)--
if (mode !== 'images') {
step++; console.log(`[${step}/${totalSteps}] 写入视频轨道时间线...`)
await addSlotsLocally(draftUrl, items, timeline, null, { draftId })
console.log(' 视频轨道写入完成\n')
}
// -- 云渲染(可选)--
if (apiKey) {
console.log('提交云渲染...')

View File

@@ -3,12 +3,15 @@
*
* 核心算法模块。纯函数 + ffmpeg自包含可测试。
*
* 规则:
* 铁律(固化,不可绕过):
* 音频生成后不可调速TTS=1.15xCapCut无speed字段
* 视频:始终配合音频时长(只允许加速/截断,不允许慢放/冻结)
*
* 时间线规则:
* 图片模式: TTS 音频时长 = 画面时长,无音频 = 跳过
* 视频模式: TTS 为主轴,视频通过策略适配
* 视频比音频长 → 加速(≤2x) / 裁剪(>2x)
* 视频比音频短 → 放缓(≥0.5x) / 画面停顿(<0.5x)
* 所有策略失败 → 兜底截断
* 视频比音频短 → 禁止!应在分镜阶段拆分 shot不允许慢放/冻结补齐
*/
const fs = require('fs')
@@ -20,6 +23,20 @@ const { US } = require('./capcut-api')
// 时间线构建
// ============================================================================
/**
* 构建时间线条目
*
* @param {Array} items - manifest items
* @returns {Array} timeline entries
*
* 策略选择(固化,按 ratio = videoDur / audioDur
* ≥ 1.1, ≤ 2 → speed_up (视频加速追上音频,最优)
* > 2 → trim (视频截断至音频时长)
* 0.9 ~ 1.1 → none (接近匹配,无需调整)
* < 0.9 → 禁止!音频时长超过视频,分镜阶段未正确拆分 shot
*
* 铁律:不允许 slow_down / freeze不允许音频调速
*/
function buildTimeline(items) {
let offset = 0
return items.map(item => {
@@ -46,7 +63,7 @@ function buildTimeline(items) {
return entry
}
// 视频模式:策略选择
// 视频模式:策略选择(铁律:不允许音频>视频)
const ratio = videoDur / audioDur
if (ratio > 1.1) {
@@ -59,23 +76,25 @@ function buildTimeline(items) {
offset += dur
return entry
}
} else if (ratio < 0.9) {
if (ratio >= 0.5) {
const entry = { start: offset, end: offset + dur, duration: dur, speed: ratio, strategy: 'slow_down' }
offset += dur
return entry
} else {
const entry = {
start: offset, end: offset + dur, duration: dur, speed: 1,
strategy: 'freeze', freezeExtra: dur - videoDur,
}
offset += dur
return entry
}
} else {
} else if (ratio >= 0.9) {
// 0.9 ~ 1.1:无需调整
const entry = { start: offset, end: offset + dur, duration: dur, speed: 1, strategy: 'none' }
offset += dur
return entry
} else {
// ratio < 0.9:音频时长超过视频!
// 铁律禁止:不允许慢放/冻结/拼接补齐。此情况应在分镜阶段拆分 shot。
// 强制截断并打印错误标记,由主 Agent 上报给用户/打回分镜重做。
const entry = {
start: offset, end: offset + dur, duration: dur, speed: 1,
strategy: 'FORBIDDEN_audio_gt_video',
ratio: parseFloat(ratio.toFixed(3)),
videoDur: parseFloat((videoDur / US).toFixed(2)),
audioDur: parseFloat((audioDur / US).toFixed(2)),
error: '音频时长(' + (audioDur / US).toFixed(2) + 's) > 视频时长(' + (videoDur / US).toFixed(2) + 's),分镜阶段 shot 未正确拆分,请打回重新切割',
}
offset += dur
return entry
}
})
}
@@ -87,16 +106,18 @@ function buildTimeline(items) {
/**
* ffmpeg 视频调整:根据策略适配音频时长
*
* 策略(按 ratio = videoDur / audioDur 选择):
* speed_up (ratio > 1.1, ≤2x) → setpts 压缩时间(加速)
* trim (ratio > 2x) → 截断到目标时长
* slow_down (ratio < 0.9, ≥0.5x) → setpts 拉长时间(慢放)
* freeze (ratio < 0.5x) → 视频原速 + 最后一帧冻结补时长
* 允许策略(按 ratio = videoDur / audioDur 选择):
* speed_up (ratio > 1.1, ≤2x) → setpts 压缩时间(加速),最优
* trim (ratio > 2x) → 截断到目标时长,次选
* none (0.9~1.1) → 无需调整
*
* 禁止策略(已删除):
* slow_down (ratio < 0.9) → ❌ 音频不可调速!
* freeze (ratio < 0.5) → ❌ 不允许冻结帧补齐!
*
* 所有策略失败后兜底:截断到目标时长
*/
async function adjustVideoSpeed(videoPath, targetDurationSec, strategy = 'none', speed = 1, freezeExtraUs = 0) {
async function adjustVideoSpeed(videoPath, targetDurationSec, strategy = 'none', speed = 1) {
if (!fs.existsSync(videoPath)) return videoPath
if (strategy === 'none') return videoPath
@@ -150,72 +171,9 @@ async function adjustVideoSpeed(videoPath, targetDurationSec, strategy = 'none',
console.log(` 加速: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s (${speedVal}x)`)
resolve(outPath)
})
} else if (strategy === 'slow_down') {
const factor = (1 / speed).toFixed(3)
execFile('ffmpeg', [
'-y', '-i', videoPath,
'-filter_complex', `setpts=PTS*${factor}`,
'-an',
outPath
], { timeout: 30000 }, (err) => {
if (err) {
console.log(` 放缓失败,兜底截断: ${err.message}`)
fallbackTrim(resolve)
return
}
console.log(` 放缓: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s (${speed.toFixed(2)}x speed)`)
resolve(outPath)
})
} else if (strategy === 'freeze') {
const freezeSec = freezeExtraUs / US
execFile('ffmpeg', [
'-y', '-i', videoPath,
'-filter_complex', `tpad=stop=-1:stop_duration=${freezeSec.toFixed(3)}`,
'-an',
outPath
], { timeout: 30000 }, (err) => {
if (err) {
console.log(` tpad freeze 失败,尝试 concat 方案: ${err.message}`)
const lastFrame = videoPath.replace(/(\.\w+)$/, '_lastframe.png')
const frozenVideo = videoPath.replace(/(\.\w+)$/, '_frozen.mp4')
execFile('ffmpeg', [
'-y', '-sseof', '-0.1', '-i', videoPath,
'-frames:v', '1', lastFrame
], { timeout: 10000 }, (err2) => {
if (err2) { console.log(` concat 方案也失败,兜底截断`); fallbackTrim(resolve); return }
execFile('ffmpeg', [
'-y', '-loop', '1', '-i', lastFrame,
'-t', String(freezeSec.toFixed(3)),
'-pix_fmt', 'yuv420p',
'-vf', 'scale=trunc(iw/2)*2:trunc(ih/2)*2',
frozenVideo
], { timeout: 15000 }, (err3) => {
if (err3) {
try { fs.unlinkSync(lastFrame) } catch (_) {}
console.log(` 冻结帧视频生成失败,兜底截断`)
fallbackTrim(resolve)
return
}
const concatList = path.join(path.dirname(videoPath), '_freeze_concat.txt')
fs.writeFileSync(concatList, `file '${videoPath}'\nfile '${frozenVideo}'\n`)
execFile('ffmpeg', [
'-y', '-f', 'concat', '-safe', '0', '-i', concatList,
'-c', 'copy', outPath
], { timeout: 30000 }, (err4) => {
try { fs.unlinkSync(lastFrame); fs.unlinkSync(frozenVideo); fs.unlinkSync(concatList) } catch (_) {}
if (err4) { console.log(` 拼接失败,兜底截断`); fallbackTrim(resolve); return }
console.log(` 画面停顿: ${videoDur.toFixed(1)}s + 冻结 ${freezeSec.toFixed(1)}s = ${targetDurationSec.toFixed(1)}s`)
resolve(outPath)
})
})
})
return
}
console.log(` 画面停顿: ${videoDur.toFixed(1)}s + 冻结 ${freezeSec.toFixed(1)}s = ${targetDurationSec.toFixed(1)}s`)
resolve(outPath)
})
} else {
resolve(videoPath)
// 未知策略,兜底截断
fallbackTrim(resolve)
}
})
})

View File

@@ -3,6 +3,10 @@
*
* 所有 add* 函数 + 转场策略 + 账号配置读取。
* Agent 修改字幕风格、Ken Burns、转场、特效等只需关注此文件。
*
* 音频策略(固化铁律):
* - 音频由 TTS 1.15x 生成,导入 CapCut 时无 speed 字段(不可调速)
* - 每个 item 的 segments[] 逐段添加,各段 start 按 startOffset 精确对齐
*/
const path = require('path')
@@ -303,33 +307,233 @@ async function addVideos(draftUrl, inputDir, items, timeline, width, height, tra
return allSegmentIds
}
// ============================================================================
// 将 segment 写入视频轨道时间线slot
// 背景add_videos 只负责把视频加入素材库,不自动上时间线。
// 此函数在 add_videos 成功后调用,将每个 segment_id 写入第一个 video track。
// ============================================================================
async function addSlots(draftUrl, items, timeline) {
const { api: capcutApi, US } = require('./capcut-api')
const { getManifestDir } = require('./pipeline-utils')
const path = require('path')
// 获取当前云端草稿的 draft_content获取第一个 video track 的 id
let draftData
try {
draftData = (await capcutApi('get_draft', { draft_url: draftUrl })).data || {}
} catch (err) {
// get_draft 接口不可用,尝试从本地 manifest 目录寻找草稿
const manifestDir = path.dirname(draftUrl.startsWith('http') ? draftUrl : '')
console.log(' get_draft 不可用,切换本地写入模式')
return addSlotsLocally(draftUrl, items, timeline)
}
const tracks = draftData.tracks || []
const videoTrack = tracks.find(t => t.type === 'video')
if (!videoTrack) {
console.log(' 未找到 video track跳过 slot 写入')
return
}
// 构造 slot 数据
const slots = []
for (let i = 0; i < items.length; i++) {
const item = items[i]
const tl = timeline[i]
const segId = item.segmentId || item._segmentId
if (!segId) continue
const slotId = generateUUID()
slots.push({
id: slotId,
material_id: segId,
track_id: videoTrack.id,
render_index: i,
type: 'video',
common_property: {
start_time: tl.start,
source_timerange: { start: 0, duration: tl.duration },
target_timerange: { start: tl.start, duration: tl.duration },
is_avatar: false,
audio_fade: { fade_in_duration: 0, fade_out_duration: 0 },
volume: 1.0,
},
})
}
if (slots.length === 0) {
console.log(' 无有效 slot 数据,跳过')
return
}
// 通过 add_slots API 写入
try {
await capcutApi('add_slots', {
draft_url: draftUrl,
slots: JSON.stringify(slots),
})
console.log(` 已写入 ${slots.length} 个 slot 到视频轨道`)
} catch (err) {
// API 不支持时,降级为本地写入
console.log(` add_slots API 不可用: ${err.message},降级为本地写入`)
await addSlotsLocally(draftUrl, items, timeline, videoTrack.id)
}
}
// 直接写入本地 draft_content.json 的 slot
// options.draftId: 可选,直接指定 draftId优先使用否则从 draftUrl 提取
async function addSlotsLocally(draftUrl, items, timeline, trackId, options = {}) {
const { api: capcutApi, US } = require('./capcut-api')
const fs = require('fs')
// 优先使用 options.draftId否则从 draftUrl 提取
let draftId = options.draftId || null
if (!draftId) {
try {
draftId = new URL(draftUrl).searchParams.get('draft_id')
} catch {
console.log(' 无法解析 draftUrl跳过本地 slot 写入')
return
}
}
if (!draftId) {
console.log(' 无法提取 draft_id跳过本地 slot 写入')
return
}
const { getConfig } = require('./capcut-api')
const jianyingPath = getConfig().jianyingDraftPath
const draftPath = path.join(jianyingPath, draftId, 'draft_content.json')
if (!fs.existsSync(draftPath)) {
console.log(` 本地草稿不存在: ${draftPath},跳过 slot 写入`)
return
}
let draft
try {
draft = JSON.parse(fs.readFileSync(draftPath, 'utf-8'))
} catch {
console.log(' draft_content.json 读取失败,跳过')
return
}
// 找到第一个 video track
const videoTrack = trackId
? draft.tracks.find(t => t.id === trackId)
: draft.tracks.find(t => t.type === 'video')
if (!videoTrack) {
console.log(' 未找到 video track跳过')
return
}
const slots = []
for (let i = 0; i < items.length; i++) {
const item = items[i]
const tl = timeline[i]
const segId = item.segmentId || item._segmentId
if (!segId) {
// 尝试从 materials.videos 匹配
const fname = item.video ? path.basename(item.video) : ''
const matVideo = (draft.materials.videos || []).find(v => {
const matFname = path.basename(v.path || '')
return fname && matFname.includes(fname.replace('videos/', ''))
})
if (matVideo) {
items[i]._segmentId = matVideo.id
slots.push(buildSlot(matVideo.id, videoTrack.id, i, tl, US))
}
} else {
slots.push(buildSlot(segId, videoTrack.id, i, tl, US))
}
}
if (slots.length > 0) {
videoTrack.slots = slots
draft.duration = timeline.length > 0 ? timeline[timeline.length - 1].end : 0
fs.writeFileSync(draftPath, JSON.stringify(draft, null, 2), 'utf-8')
console.log(` 已本地写入 ${slots.length} 个 slot 到视频轨道`)
// 触发剪映扫描
triggerDirScan(path.dirname(draftPath))
}
}
function buildSlot(segId, trackId, index, tl, US) {
return {
id: generateUUID(),
material_id: segId,
track_id: trackId,
render_index: index,
type: 'video',
common_property: {
start_time: tl.start,
source_timerange: { start: 0, duration: tl.duration },
target_timerange: { start: tl.start, duration: tl.duration },
is_avatar: false,
audio_fade: { fade_in_duration: 0, fade_out_duration: 0 },
volume: 1.0,
},
}
}
function generateUUID() {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, c => {
const r = Math.random() * 16 | 0
return (c === 'x' ? r : (r & 0x3 | 0x8)).toString(16).toUpperCase()
})
}
function triggerDirScan(dir) {
const { execFile } = require('child_process')
const tmp = dir + '.slot_tmp'
if (process.platform === 'darwin') {
execFile('rsync', ['-a', dir + '/', tmp], (err) => {
try { require('fs').rmSync(tmp, { recursive: true, force: true }) } catch {}
})
}
}
// ============================================================================
// 添加 TTS 配音
// ============================================================================
async function addVoiceover(draftUrl, inputDir, items, timeline, audioUrls = {}) {
const audioItems = items.filter(item => item.audio)
if (audioItems.length === 0) {
console.log(' 无 TTS 音频文件,跳过')
return
}
const audioInfos = []
const resolveAudio = (relPath) => {
if (relPath.startsWith('http')) return relPath
if (audioUrls[relPath]) return audioUrls[relPath]
return path.isAbsolute(relPath) ? relPath : path.resolve(inputDir, relPath)
}
// 优先使用 segments[] 逐段添加(精确对齐)
// 无 segments 时降级为旧的整段方式
const segmentsFlat = []
for (let i = 0; i < items.length; i++) {
const item = items[i]
const tl = timeline[i]
if (!item.audio) continue
if (item.audio) {
const audioUrl = resolveAudio(item.audio)
if (item.segments && item.segments.length > 0) {
// 使用 segments 精确添加
for (const seg of item.segments) {
if (!seg.audio || seg.error) continue
const audioUrl = seg.audio.startsWith('http')
? seg.audio
: (audioUrls[seg.audio] || path.resolve(inputDir, seg.audio))
const segDurUs = Math.round(seg.duration * US)
const segStartUs = tl.start + Math.round(seg.startOffset * US)
segmentsFlat.push({
audio_url: audioUrl,
start: segStartUs,
end: segStartUs + segDurUs,
duration: segDurUs,
volume: 1.0,
})
}
} else {
// 降级:整段添加
const audioUrl = item.audio.startsWith('http')
? item.audio
: (audioUrls[item.audio] || path.resolve(inputDir, item.audio))
const audioDurUs = item.audioDuration ? item.audioDuration * US : tl.duration
audioInfos.push({
segmentsFlat.push({
audio_url: audioUrl,
start: tl.start,
end: tl.start + audioDurUs,
@@ -339,17 +543,26 @@ async function addVoiceover(draftUrl, inputDir, items, timeline, audioUrls = {})
}
}
if (audioInfos.length === 0) {
console.log(' 无可用音频,跳过配音')
if (segmentsFlat.length === 0) {
console.log(' 无 TTS 音频文件,跳过')
return
}
await api('add_audios', {
draft_url: draftUrl,
audio_infos: JSON.stringify(audioInfos),
})
const ossCount = audioInfos.filter(a => a.audio_url.startsWith('http')).length
console.log(` 已添加 ${audioInfos.length} 段 TTS 配音 (${ossCount > 0 ? `${ossCount} 段 OSS + ` : ''}${audioInfos.length - ossCount} 段本地)`)
// 逐个添加音频CapCut API 批量添加不稳定)
let addedCount = 0
for (const audioInfo of segmentsFlat) {
try {
await api('add_audios', {
draft_url: draftUrl,
audio_infos: JSON.stringify([audioInfo]),
})
addedCount++
} catch (err) {
console.error(` 音频添加失败: ${err.message.slice(0, 80)}`)
}
}
const ossCount = segmentsFlat.filter(a => a.audio_url.startsWith('http')).length
console.log(` 已添加 ${addedCount}/${segmentsFlat.length} 段 TTS 配音 (${ossCount} 段 OSS)`)
}
// ============================================================================
@@ -402,7 +615,24 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
const tl = timeline[i]
if (split) {
if (split && item.segments && item.segments.length > 0) {
// 精确字幕模式:使用 segments 实测时长,逐段添加字幕
for (const seg of item.segments) {
if (seg.error || !seg.text) continue
const segStartUs = tl.start + Math.round(seg.startOffset * US)
const segDurUs = Math.round(seg.duration * US)
const cap = {
start: segStartUs,
end: segStartUs + segDurUs,
text: seg.text,
}
applyAnimationProps(cap, animStyle)
captions.push(cap)
}
} else if (split) {
// 降级:按字符比例分配(无 segments 时)
const sentences = splitTextIntoSentences(text)
if (sentences.length === 0) continue
@@ -447,9 +677,7 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
return
}
await api('add_captions', {
draft_url: draftUrl,
captions: JSON.stringify(captions),
const commonStyle = {
font: style.font || null,
font_size: style.fontSize || 15,
text_color: style.color || '#ffffff',
@@ -472,9 +700,23 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
transform_x: 0,
transform_y: style.transformY || 0,
style_text: 0,
})
}
console.log(` 已添加 ${captions.length} 条字幕${split ? ' (分句模式)' : ''} (字体: ${style.font || '默认'}, 动画: ${animStyle.inAnimation || '无'}${animStyle.outAnimation || '无'})`)
// 逐条添加字幕CapCut API 批量添加不稳定)
let addedCount = 0
for (const cap of captions) {
try {
await api('add_captions', {
draft_url: draftUrl,
captions: JSON.stringify([cap]),
...commonStyle,
})
addedCount++
} catch (err) {
console.error(` 字幕添加失败: ${err.message.slice(0, 80)}`)
}
}
console.log(` 已添加 ${addedCount}/${captions.length} 条字幕${split ? ' (分句模式)' : ''} (字体: ${style.font || '默认'}, 动画: ${animStyle.inAnimation || '无'}${animStyle.outAnimation || '无'})`)
}
// ============================================================================
@@ -583,6 +825,8 @@ module.exports = {
addBGM,
addSubtitles,
addKeywordOverlays,
addSlots,
addSlotsLocally,
addEffects,
addFilter,
}

View File

@@ -72,6 +72,28 @@ function initManifest(options) {
console.log(`${refsWithoutUrl.length} 个参考图缺少 OSS URLimages 阶段会自动上传`)
}
// 从 videoModel 推算固定时长(秒)
const videoModelFixedDurations = {
'kling': 6,
'kling-v2-5-turbo': 6,
'veo3-fast': 8,
'veo3-fast-frames': 8,
'grok-video-3': 6,
}
const estimatedVideoDuration = videoModelFixedDurations[options.videoModel || accountConfig.videoModel] || 6
// 校验时长约束
for (let i = 0; i < rawItems.length; i++) {
const item = rawItems[i]
const dur = Number(item.duration) || 5
if (dur > estimatedVideoDuration) {
console.error(`错误: items[${i}] 的 TTS 估算 duration=${dur}s > videoModel 固定时长 ${estimatedVideoDuration}s`)
console.error(` 必须先拆分 shot 再执行 init`)
console.error(` script: "${item.script}"`)
process.exit(1)
}
}
// 构建 items
const items = rawItems.map((raw, i) => {
const slug = slugify(raw.shotDesc || raw.script || `scene_${i + 1}`)
@@ -81,7 +103,8 @@ function initManifest(options) {
file: `images/scene_${String(i + 1).padStart(2, '0')}_${slug}.jpeg`,
shotDesc: raw.shotDesc || '',
script: raw.script || '',
duration: raw.duration || 5,
duration: Number(raw.duration) || 5,
estimatedVideoDuration,
imagePrompt: raw.imagePrompt,
confirmed: false,
}
@@ -102,8 +125,10 @@ function initManifest(options) {
references,
...(accountConfig.ttsVoice ? { ttsVoice: accountConfig.ttsVoice } : {}),
...(accountConfig.ttsInstruction ? { ttsInstruction: accountConfig.ttsInstruction } : {}),
...(accountConfig.ttsRate ? { ttsRate: accountConfig.ttsRate } : {}),
// 铁律ttsRate 写死 1.15x,不允许配置覆盖(除非显式传入)
ttsRate: options.ttsRate || 1.15,
items,
estimatedVideoDuration, // 顶层冗余,便于 assemble 直接读取
}
// 创建输出目录(自增序号)

View File

@@ -1,13 +1,100 @@
/**
* Phase: tts — 语音合成(整段合成)
* Phase: tts — 语音合成(先分段,后合成)
*
* 每个 item 的 script 整段合成一个音频文件,保留自然语调
* item.audio 指向完整音频item.audioDuration 为总时长。
* 字幕切分由组装阶段按字符比例分配,不在 TTS 阶段处理。
* 核心变化:音频分段优先于生图
*
* 1. 在生成图片之前,先将文案按语义断点切分为多个音频片段
* 2. 每个片段时长 < videoModel 固定时长Kling=6s
* 3. 逐段合成,记录实测时长,写入 manifest.segments[]
* 4. manifest.items[n].segments = [{text, audio, duration, startOffset}, ...]
* 5. manifest.items[n].audioDuration = 片段总和(供 assemble 计算 ratio
*
* 流程顺序变为tts → images → upload → videos → assemble
*/
const path = require('path')
const { saveManifest, ensureDir, log, getManifestDir } = require('./pipeline-utils')
const { saveManifest, ensureDir, log, getManifestDir, splitTextIntoSentences } = require('./pipeline-utils')
/**
* 在语义断点处将文案切分为音频片段
* 每段时长(估算)必须 < videoDuration且尽量接近最佳 ratio 接近1.0
*
* @param {string} text - 完整文案
* @param {number} videoDur - 视频模型固定时长(秒),如 6
* @param {number} charsPerSec - 语速(字/秒),固定 5
* @returns {Array<{text, estimatedDuration}>}
*/
function splitIntoAudioSegments(text, videoDur, charsPerSec = 5) {
// 优先在自然断点切分(句号/感叹号/分号)
const naturalBreaks = splitTextIntoSentences(text)
if (naturalBreaks.length <= 1) {
// 无自然断点:在半段处(含小数点)切分
const chars = text.length
const estimatedTotal = chars / charsPerSec
if (estimatedTotal <= videoDur) {
// 整段可容纳
return [{ text, estimatedDuration: estimatedTotal }]
}
// 无法单段容纳,在中间逗号处切
const mid = Math.floor(chars / 2)
const breakIdx = text.indexOf('', mid)
if (breakIdx > 0) {
return [
{ text: text.slice(0, breakIdx + 1), estimatedDuration: (breakIdx + 1) / charsPerSec },
{ text: text.slice(breakIdx + 1), estimatedDuration: (chars - breakIdx - 1) / charsPerSec },
]
}
// 强制按字数切
const halfChars = Math.floor(chars / 2)
return [
{ text: text.slice(0, halfChars), estimatedDuration: halfChars / charsPerSec },
{ text: text.slice(halfChars), estimatedDuration: (chars - halfChars) / charsPerSec },
]
}
// 多个自然句:逐句判断,合并短句
const result = []
let currentText = ''
let currentEstDur = 0
for (let i = 0; i < naturalBreaks.length; i++) {
const sentence = naturalBreaks[i]
const sentenceLen = sentence.length
const sentenceEstDur = sentenceLen / charsPerSec
if (currentEstDur + sentenceEstDur <= videoDur) {
// 可以合并到当前段
currentText += sentence + '。'
currentEstDur += sentenceEstDur
} else {
// 先保存当前段
if (currentText) {
result.push({ text: currentText.trim(), estimatedDuration: currentEstDur })
}
currentText = sentence + '。'
currentEstDur = sentenceEstDur
// 单句本身超长(超 videoDur
if (sentenceEstDur > videoDur) {
// 按半段切
const halfLen = Math.floor(sentenceLen / 2)
const half1 = sentence.slice(0, halfLen)
const half2 = sentence.slice(halfLen)
// 回退上一段,用两个半段替代
result.pop()
result.push({ text: half1, estimatedDuration: halfLen / charsPerSec })
currentText = half2 + '。'
currentEstDur = (sentenceLen - halfLen) / charsPerSec
}
}
}
if (currentText) {
result.push({ text: currentText.trim(), estimatedDuration: currentEstDur })
}
return result
}
async function phaseTts(manifest, manifestPath, options = {}) {
const dir = getManifestDir(manifestPath)
@@ -16,38 +103,89 @@ async function phaseTts(manifest, manifestPath, options = {}) {
const { synthesize } = require('../qwen-tts')
const items = manifest.items.filter(it =>
it.status === 'done' && (it.script || it.text) && !it.audio
)
if (items.length === 0) { log('tts', '无待处理 item跳过'); return }
const videoDur = manifest.estimatedVideoDuration || 6
const ttsRate = manifest.ttsRate || 1.15
log('tts', `${items.length}`)
const items = manifest.items.filter(it =>
(it.script || it.text) && !it.audio
)
if (items.length === 0) { log('tts', '无待处理 item已合成跳过'); return }
log('tts', `${items.length} 段, 视频固定时长=${videoDur}s, TTS语速=${ttsRate}x`)
for (let i = 0; i < items.length; i++) {
const item = items[i]
const idx = i + 1
const fullText = item.script || item.text
const fullText = (item.script || item.text).trim()
try {
const { filePath, duration } = await synthesize(fullText, {
outputDir: audioDir,
id: String(item.id || idx),
voice: manifest.ttsVoice || undefined,
instruction: manifest.ttsInstruction || undefined,
rate: manifest.ttsRate || undefined,
})
const totalDuration = Math.round(duration * 1000) / 1000
item.audio = path.relative(dir, filePath).replace(/\\/g, '/')
item.audioDuration = totalDuration
log('tts', `[${idx}/${items.length}] ${totalDuration.toFixed(1)}s: ${fullText.substring(0, 30)}...`)
} catch (err) {
item.status = 'failed'
item.error = `TTS失败: ${err.message}`
log('tts', `[${idx}/${items.length}] 失败: ${err.message}`)
// Step 1: 计算音频分段
const rawSegments = splitIntoAudioSegments(fullText, videoDur)
log('tts', `[${idx}/${items.length}] 原始分段: ${rawSegments.length}`)
for (const seg of rawSegments) {
log('tts', ` 分段估算: ${seg.estimatedDuration.toFixed(2)}s / ${seg.text.slice(0, 20)}...`)
}
// Step 2: 逐段合成
const segments = []
let globalOffset = 0
for (let j = 0; j < rawSegments.length; j++) {
const segInput = rawSegments[j]
const segId = `${item.id}_${j + 1}`
try {
const { filePath, duration: realDuration } = await synthesize(segInput.text, {
outputDir: audioDir,
id: segId,
voice: manifest.ttsVoice || undefined,
instruction: manifest.ttsInstruction || undefined,
rate: ttsRate,
})
const segment = {
id: segId,
text: segInput.text,
audio: path.relative(dir, filePath).replace(/\\/g, '/'),
estimatedDuration: Math.round(segInput.estimatedDuration * 1000) / 1000,
duration: Math.round(realDuration * 1000) / 1000,
startOffset: Math.round(globalOffset * 1000) / 1000,
}
segments.push(segment)
globalOffset += realDuration
log('tts', `[${idx}/${items.length}] 段${j + 1}: 估算${segInput.estimatedDuration.toFixed(2)}s → 实测${realDuration.toFixed(2)}s | ${segInput.text.slice(0, 15)}...`)
} catch (err) {
log('tts', `[${idx}/${items.length}] 段${j + 1} 合成失败: ${err.message}`)
segments.push({
id: segId,
text: segInput.text,
audio: '',
estimatedDuration: segInput.estimatedDuration,
duration: 0,
startOffset: globalOffset,
error: err.message,
})
globalOffset += segInput.estimatedDuration
}
}
// Step 3: 汇总到 item
const totalAudioDuration = Math.round(globalOffset * 1000) / 1000
item.segments = segments
item.audio = segments[0]?.audio || ''
item.audioDuration = totalAudioDuration
item.segmentCount = segments.length
// Step 4: 时长合规诊断
const ratio = videoDur / totalAudioDuration
if (ratio < 0.9) {
item._timelineWarning = `⚠ audioDur(${totalAudioDuration.toFixed(1)}s) > videoDur(${videoDur}s)ratio=${ratio.toFixed(2)}assemble 将截断`
}
log('tts', `[${idx}/${items.length}] 完成: ${segments.length}段, 总音频${totalAudioDuration.toFixed(1)}s, ratio=${ratio.toFixed(2)}`)
saveManifest(manifestPath, manifest)
}
}
module.exports = { phaseTts }
module.exports = { phaseTts, splitIntoAudioSegments }

View File

@@ -34,13 +34,13 @@ const { createAccount } = require('./lib/cmd-create-account')
// 阶段注册表
// ============================================================================
const ALL_PHASES = ['images', 'upload', 'videos', 'tts', 'assemble']
const ALL_PHASES = ['tts', 'images', 'upload', 'videos', 'assemble']
const PHASE_HANDLERS = {
tts: phaseTts,
images: phaseImages,
upload: phaseUpload,
videos: phaseVideos,
tts: phaseTts,
assemble: phaseAssemble,
}
@@ -229,11 +229,12 @@ async function main() {
console.log(' pipeline.js validate --manifest <path>')
console.log(' pipeline.js confirm --manifest <path> --all')
console.log(' pipeline.js confirm --manifest <path> --items 1,3,5')
console.log(' pipeline.js run --manifest <path> [--account id] [--phase p1,p2] [--resume] [--retry-failed]')
console.log(' pipeline.js run --manifest <path> --phase tts,images,upload,videos,assemble')
console.log(' pipeline.js run --manifest <path> --resume')
console.log(' pipeline.js status --manifest <path>')
console.log('')
console.log('Manifest 路径约定: output/{account}_{date}_{NNN}/manifest.json同天自增序号')
console.log('阶段: images, upload, videos, tts, assemble')
console.log('阶段: tts → images upload videos assembleTTS提前')
}
if (require.main === module) {

View File

@@ -108,7 +108,7 @@ function synthesize(text, options = {}) {
format: 'mp3',
sample_rate: 24000,
volume: 50,
rate: options.rate || 1.1,
rate: options.rate || 1.15,
pitch_rate: 1.0,
text_type: 'PlainText',
...(instruction ? { instruction } : {}),