feat(video-pipeline): 重构视频流水线,优化成片时间线规则和状态管理

- 引入 manifest.json 作为唯一状态源,所有子 Agent 操作回写 manifest
- 重构 timebuilder 逻辑,支持四种视频适配策略(加速/裁剪/放缓/画面停顿)
- 统一 TTS 阶段输出结构,单句和多句均写入 segments[]
- 重写字幕和配音生成,基于 segments 精确时长实现音画同步
- 新增 confirm 命令支持按 id 范围确认,上传阶段分离图片和视频
- 添加中间产物写入 output/ 目录的约束,清理废弃配置参数
This commit is contained in:
2026-05-02 00:14:40 +08:00
parent b4b92854db
commit 0998fd6ae1
14 changed files with 457 additions and 205 deletions

View File

@@ -1,6 +1,6 @@
{ {
"jianyingDraftPath": "/Users/lc/Movies/JianyingPro/User Data/Projects/com.lveditor.draft", "jianyingDraftPath": "C:/Users/45070/AppData/Local/JianyingPro/User Data/Projects/com.lveditor.draft",
"capcutMateDir": "/Users/lc/capcut-mate", "capcutMateDir": "C:/Users/45070/capcut-mate",
"capcutMateApiBase": "http://capcut.muyetools.cn/openapi/capcut-mate/v1", "capcutMateApiBase": "http://capcut.muyetools.cn/openapi/capcut-mate/v1",
"imgbbApiKey": "deprecated", "imgbbApiKey": "deprecated",
"geminiApiBaseUrl": "https://yunwu.ai", "geminiApiBaseUrl": "https://yunwu.ai",

View File

@@ -35,14 +35,15 @@ B 模式又分两种:**单图模式**1 图 → 1 段视频)/ **首尾帧
### 核心约束 ### 核心约束
1. **不可跳步** 1. **不可跳步**
- A幻灯片分镜 → 图片提示词 → 生图 → TTS+成片。无视频阶段 - A幻灯片分镜 → manifest init → 图片提示词 → 生图 → TTS+成片。无视频阶段
- BAI视频分镜 → 图片提示词 → 生图 → 视频提示词 → 生视频 → TTS+成片 - BAI视频分镜 → manifest init → 图片提示词 → 生图 → 视频提示词 → 生视频 → TTS+成片
- 阶段之间必须审查 - 阶段之间必须审查
2. **manifest.json 是唯一状态源**任何操作完成后立即回写 2. **manifest.json 是唯一状态源**`pipeline.js init` 在分镜确认后立即执行,创建 `output/{name}/` 目录和初始 manifest。后续所有子 Agent 输出回写此 manifest不再传裸 JSON
3. **禁止 curl 调 API**:生图/生视频必须通过 `pipeline.js` 或对应 generator 脚本 3. **禁止 curl 调 API**:生图/生视频必须通过 `pipeline.js` 或对应 generator 脚本
4. **并行优先**:独立子任务用子 Agent 并行 4. **并行优先**:独立子任务用子 Agent 并行
5. **分镜表是脊骨契约**:用户确认分镜表后,下游子 Agent 只能加字段,禁止改 shot 数量/顺序/字段值。主 Agent 每次接收子 Agent 输出,第一件事数数量是否对得上 5. **分镜表是脊骨契约**:用户确认分镜表后,下游子 Agent 只能加字段,禁止改 shot 数量/顺序/字段值。主 Agent 每次接收子 Agent 输出,第一件事数数量是否对得上
6. **prompts/*.md 只被子 Agent 读**:主 Agent 读 account.json不读子 Agent 提示词模板 6. **prompts/*.md 只被子 Agent 读**:主 Agent 读 account.json不读子 Agent 提示词模板
7. **中间产物落 output**所有中间文件items JSON、urls 缓存、子 Agent 输出)必须写入 `output/{name}/` 目录,禁止散落在项目根目录
### Step -1: 意图确认(逐项确认,缺一不可) ### Step -1: 意图确认(逐项确认,缺一不可)
@@ -79,26 +80,27 @@ B 模式又分两种:**单图模式**1 图 → 1 段视频)/ **首尾帧
→ 展示给用户确认。确认后**分镜表锁定为脊骨契约**,下游禁止增减 shot。 → 展示给用户确认。确认后**分镜表锁定为脊骨契约**,下游禁止增减 shot。
### Step 2-0: Manifest 初始化
```bash
node scripts/pipeline.js init --account <id> --mode <single|framePair> \
--items '[{"id":1,"shotDesc":"...","script":"...","duration":5,"directorRef":"tarantino","keyword":"权力"}]'
```
- 分镜确认后立即执行,创建 `output/{name}/` 目录和初始 `manifest.json`
- 脚本从 account.json 继承imageModel、videoModel、format、references
- `imagePrompt` 暂为空Step 2-A 补充;`videoPrompt` 暂为空Step 3-A 补充
- 输出路径打印到控制台,后续所有操作以此为工作目录
### Step 2-A: 图片提示词(子 Agent 执行) ### Step 2-A: 图片提示词(子 Agent 执行)
- 主 Agent 传**完整分镜表 JSON**(不传原始文案)+ 图片提示词模板路径给子 Agent - 主 Agent 传**manifest 路径 + 图片提示词模板路径**给子 Agent
- 子 Agent 为每个 shot 追加 `imagePrompt` 字段 - 子 Agent 读 manifest.items为每个 shot 追加 `imagePrompt` 字段后回写 manifest
- 入参来自分镜表shotDesc + script + directorRef + keyword
- 出参:分镜表 JSON + imagePrompt
- **硬约束:输出 shot 数量 == 输入 shot 数量** - **硬约束:输出 shot 数量 == 输入 shot 数量**
**主 Agent 审查**:① 数量对得上?② shotDesc 内容完整保留?③ 光影策略对应 directorRef **主 Agent 审查**:① 数量对得上?② shotDesc 内容完整保留?③ 光影策略对应 directorRef
### Step 2-B: 生图 + Manifest 初始化 ### Step 2-B: 生图
```bash
node scripts/pipeline.js init --account <id> --mode <single|framePair> \
--items '[{"shotDesc":"...","script":"...","duration":5,"imagePrompt":"...","directorRef":"tarantino","keyword":"权力"}]'
```
- items 不含 videoPrompt后续 Step 3-A 补充
- 脚本从 account.json 继承imageModel、videoModel、format、references
- 首尾帧模式:每个 item 必须有 `lastFramePrompt`
```bash ```bash
node scripts/pipeline.js run --manifest <path> --phase images node scripts/pipeline.js run --manifest <path> --phase images
@@ -111,12 +113,9 @@ node scripts/pipeline.js run --manifest <path> --phase images
### Step 3-A: 视频提示词B 模式专属,子 Agent 执行) ### Step 3-A: 视频提示词B 模式专属,子 Agent 执行)
- 主 Agent 传分镜表 JSON含已确认分镜图路径+ 视频提示词模板路径给子 Agent - 主 Agent 传**manifest 路径 + 视频提示词模板路径**给子 Agent
- 子 Agent 为每个 shot 生成 `videoPrompt` - 子 Agent 读 manifest.items含已确认分镜图路径为每个 shot 生成 `videoPrompt` 后回写 manifest
- 入参shotDesc + directorRef + 已确认分镜图 + 目标模型
- 出参videoPrompt描述镜头运动非画面内容
- **硬约束:输出数量 == 分镜表 shot 数量** - **硬约束:输出数量 == 分镜表 shot 数量**
- Agent 按 id 对齐回写 manifest.json
**主 Agent 审查**:① 数量对得上?② 描述运动而非内容?③ 字数 ≤ 50 **主 Agent 审查**:① 数量对得上?② 描述运动而非内容?③ 字数 ≤ 50

View File

@@ -9,9 +9,9 @@
## 创建方式 ## 创建方式
```bash ```bash
# Step 2-A 生成 imagePrompt 后,通过脚本初始化(不含 videoPrompt # Step 2-0分镜确认后立即初始化imagePrompt/videoPrompt 后续补充
node scripts/pipeline.js init --account 军事账号 --mode single \ node scripts/pipeline.js init --account 军事账号 --mode single \
--items '[{"shotDesc":"英文画面描述","script":"中文口播文案","duration":5,"imagePrompt":"English prompt","directorRef":"tarantino","keyword":"权力"}]' --items '[{"shotDesc":"英文画面描述","script":"中文口播文案","duration":5,"directorRef":"tarantino","keyword":"权力"}]'
# 或从文件读取 # 或从文件读取
node scripts/pipeline.js init --account 军事账号 --mode single --items-file ./items.json node scripts/pipeline.js init --account 军事账号 --mode single --items-file ./items.json
@@ -193,7 +193,7 @@ node scripts/pipeline.js run --manifest <path> --retry-failed
## 目录结构 ## 目录结构
``` ```
output/{account}_{YYYYMMDD}_{NNN}/ output/{name}_{YYYYMMDD}_{NNN}/
├── manifest.json # 主清单 ├── manifest.json # 主清单
├── images/ # scene_{NN}_{slug}.jpeg首尾帧加 _lastMJ 候选加 _cand{1-4} ├── images/ # scene_{NN}_{slug}.jpeg首尾帧加 _lastMJ 候选加 _cand{1-4}
├── videos/ # scene_{NN}_{slug}.mp4 ├── videos/ # scene_{NN}_{slug}.mp4
@@ -206,7 +206,7 @@ slug 从 `shotDesc` 派生slugify: 保留中文和字母数字,最多 20
## segments[] 字段TTS 分句) ## segments[] 字段TTS 分句)
TTS 阶段自动生成。仅当 `script` 被切分为 2 句及以上时才写入。单句时不写 segments TTS 阶段统一生成,单句时数组仅 1 个元素,多句时 N 个元素。assemble 阶段直接使用各 segment 的实际音频时长对齐字幕
| 字段 | 说明 | | 字段 | 说明 |
|------|------| |------|------|
@@ -214,4 +214,26 @@ TTS 阶段自动生成。仅当 `script` 被切分为 2 句及以上时才写入
| `audio` | 该句音频路径(相对 manifest | | `audio` | 该句音频路径(相对 manifest |
| `duration` | 该句音频时长(秒) | | `duration` | 该句音频时长(秒) |
`item.audio` 指向所有分段合并后的完整音频`item.audioDuration` 为各段累计时长。assemble 阶段优先用 `segments` 的精确时长对齐字幕,无 segments 时回退到字数权重估算。 `item.audio` 指向 `segments[0].audio``item.audioDuration` 为各段累计时长。assemble 阶段遍历 segments 逐一添加音频和字幕,使用实际文件时长(非比例分配),确保音频与字幕精确同步,消除留白。
---
## 成片时间线规则
### 图片模式images
图片没有独立时长。TTS 音频时长 = 画面时长。无 TTS 音频的 item 时长为 0跳过不显示
### 视频模式videos
TTS 音频为主轴,视频通过以下策略适配音频时长:
| ratio = videoDur/audioDur | 策略 | 说明 |
|---------------------------|------|------|
| 0.9 ~ 1.1 | none | 接近匹配,无需调整 |
| > 1.1, ≤ 2 | speed_up | 加速setpts 压缩时间) |
| > 2 | trim | 裁剪(截断到音频时长) |
| < 0.9, ≥ 0.5 | slow_down | 放缓setpts 拉长时间) |
| < 0.5 | freeze | 画面停顿(视频原速 + 最后一帧冻结补时长) |
所有策略失败后兜底:截断到目标时长。

View File

@@ -215,28 +215,89 @@ function getAudioDurationSec(filePath) {
// 主流程 // 主流程
// ============================================================================ // ============================================================================
function buildTimeline(items, defaultDurationUs) { function buildTimeline(items) {
// 音频为主轴视频调速适配≤2x 加速,>2x 截断) // 核心规则:
// 图片模式图片没有独立时长TTS 音频时长 = 画面时长。无音频 = 0 时长(跳过)
// 视频模式TTS 为主轴,视频通过 裁剪/加速/放缓/停顿 适配
// 视频比音频长ratio > 1.1:
// ≤ 2x → 加速setpts 压缩时间)
// > 2x → 裁剪(截断到音频时长)
// 视频比音频短ratio < 0.9:
// ≥ 0.5x → 放缓setpts 拉长时间≤2x慢速
// < 0.5x → 画面停顿(视频正常播放+最后一帧冻结补时长)
let offset = 0 let offset = 0
return items.map(item => { return items.map(item => {
const audioDur = (item.audioDuration != null) ? item.audioDuration * US : 0 // 有 segments 时用各段实际时长之和(精确对齐音频文件)
let audioDur
if (item.segments && item.segments.length > 0) {
audioDur = item.segments.reduce((sum, s) => sum + (s.duration || 0), 0) * US
} else {
audioDur = (item.audioDuration != null) ? item.audioDuration * US : 0
}
const videoDur = (item.videoDuration != null) ? item.videoDuration * US : 0 const videoDur = (item.videoDuration != null) ? item.videoDuration * US : 0
// 无 TTS用视频时长或固定时长 const hasVideo = !!(item.video || item.videoUrl || item.url)
// 无 TTS 音频
if (audioDur <= 0) { if (audioDur <= 0) {
const dur = videoDur || defaultDurationUs if (hasVideo && videoDur > 0) {
const entry = { start: offset, end: offset + dur, duration: dur, speed: 1 } // 视频模式无音频:用视频原始时长
const entry = { start: offset, end: offset + videoDur, duration: videoDur, speed: 1, strategy: 'none' }
offset += videoDur
return entry
}
// 图片模式无音频0 时长,标记跳过
const entry = { start: offset, end: offset, duration: 0, speed: 1, strategy: 'none', skip: true }
return entry
}
// 有 TTS音频时长为主轴
const dur = audioDur
if (!hasVideo || videoDur <= 0) {
// 图片模式:直接用音频时长
const entry = { start: offset, end: offset + dur, duration: dur, speed: 1, strategy: 'none' }
offset += dur
return entry
}
// 视频模式:视频 vs 音频时长匹配
const ratio = videoDur / audioDur
if (ratio > 1.1) {
// 视频比音频长
if (ratio <= 2) {
// 加速策略
const entry = { start: offset, end: offset + dur, duration: dur, speed: ratio, strategy: 'speed_up' }
offset += dur
return entry
} else {
// 裁剪策略
const entry = { start: offset, end: offset + dur, duration: dur, speed: 1, strategy: 'trim' }
offset += dur
return entry
}
} else if (ratio < 0.9) {
// 视频比音频短
if (ratio >= 0.5) {
// 放缓策略(慢放 ≤2x
const entry = { start: offset, end: offset + dur, duration: dur, speed: ratio, strategy: 'slow_down' }
offset += dur
return entry
} else {
// 画面停顿策略(视频原速播放 + 最后一帧冻结补时长)
const entry = {
start: offset, end: offset + dur, duration: dur, speed: 1,
strategy: 'freeze', freezeExtra: dur - videoDur,
}
offset += dur
return entry
}
} else {
// 接近匹配0.9 ~ 1.1),无需调整
const entry = { start: offset, end: offset + dur, duration: dur, speed: 1, strategy: 'none' }
offset += dur offset += dur
return entry return entry
} }
// 有 TTS音频时长为主轴
const dur = audioDur
const ratio = videoDur > 0 ? videoDur / audioDur : 1
// ≤2x: 加速到音频时长;>2x: 截断(视频只取前 audioDur 部分)
const speed = ratio <= 2 ? ratio : 1
const needAdjust = videoDur > audioDur + 100000 // 视频比音频长 0.1s 以上才需要调整
const entry = { start: offset, end: offset + dur, duration: dur, speed, needAdjust }
offset += dur
return entry
}) })
} }
@@ -253,7 +314,6 @@ async function assemble(args) {
filter: filterStr, filter: filterStr,
format = '9:16', format = '9:16',
apiKey = '', apiKey = '',
duration = '4',
animation = '轻微放大', animation = '轻微放大',
} = args } = args
@@ -284,22 +344,44 @@ async function assemble(args) {
} }
const { width, height } = getResolution(format) const { width, height } = getResolution(format)
const defaultDurationUs = parseFloat(duration) * US
// 过滤出实际存在的文件 // 过滤出实际存在的文件
const missingFileItems = []
const items = manifest.items.filter(item => { const items = manifest.items.filter(item => {
if (item.url) return true // 视频模式可能用 URL if (item.url) return true // 视频模式可能用 URL
if (item.video) return true // 视频模式本地文件
if (!item.file) {
missingFileItems.push(item.id || '?')
return false
}
const filePath = path.join(inputDir, item.file) const filePath = path.join(inputDir, item.file)
return fs.existsSync(filePath) return fs.existsSync(filePath)
}) })
if (items.length === 0) {
if (missingFileItems.length > 0) {
throw new Error(`没有可用的素材文件 — ${missingFileItems.length} 个 item 缺少 file 字段id: ${missingFileItems.join(', ')}),请先运行 images 阶段`)
}
throw new Error('没有可用的素材文件')
}
if (items.length === 0) throw new Error('没有可用的素材文件') if (items.length === 0) throw new Error('没有可用的素材文件')
// 用 ffprobe 测量实际音频/视频时长,替代 manifest 中的估计值 // 用 ffprobe 测量实际音频/视频时长,替代 manifest 中的估计值
let audioMeasured = 0, videoMeasured = 0 let audioMeasured = 0, videoMeasured = 0
for (const item of items) { for (const item of items) {
// 测量 TTS 音频实际时长(有 segments 时跳过audioDuration 已是精确累计值) // 测量各 segment 音频文件实际时长
if (item.audio && !item.audio.startsWith('http') && !item.segments) { if (item.segments && item.segments.length > 0) {
for (const seg of item.segments) {
if (!seg.audio || seg.audio.startsWith('http')) continue
const audioPath = path.isAbsolute(seg.audio)
? seg.audio
: path.resolve(inputDir, seg.audio)
if (!fs.existsSync(audioPath)) continue
const actualDur = await getAudioDurationSec(audioPath)
if (actualDur != null) { seg.duration = actualDur; audioMeasured++ }
}
} else if (item.audio && !item.audio.startsWith('http')) {
const audioPath = path.isAbsolute(item.audio) const audioPath = path.isAbsolute(item.audio)
? item.audio ? item.audio
: path.resolve(inputDir, item.audio) : path.resolve(inputDir, item.audio)
@@ -323,16 +405,32 @@ async function assemble(args) {
console.log(` 实际时长测量: 音频 ${audioMeasured} 个, 视频 ${videoMeasured}`) console.log(` 实际时长测量: 音频 ${audioMeasured} 个, 视频 ${videoMeasured}`)
} }
const timeline = buildTimeline(items, defaultDurationUs) const timeline = buildTimeline(items)
const totalDurationUs = timeline.length > 0 ? timeline[timeline.length - 1].end : 0 const totalDurationUs = timeline.length > 0 ? timeline[timeline.length - 1].end : 0
const hasTTS = items.some(item => item.audio && item.audioDuration != null) const hasTTS = items.some(item => item.audio && item.audioDuration != null)
// 时间轴诊断
for (let i = 0; i < items.length; i++) {
const item = items[i]
const tl = timeline[i]
if (tl.skip) { console.log(` [${i + 1}] 跳过(无音频)`); continue }
const audioDur = item.segments
? item.segments.reduce((s, seg) => s + (seg.duration || 0), 0)
: (item.audioDuration || 0)
const slotDur = tl.duration / US
const diff = slotDur - audioDur
const videoDur = (item.videoDuration || 0)
const stratInfo = tl.strategy && tl.strategy !== 'none' ? ` 策略=${tl.strategy}` : ''
const marker = Math.abs(diff) > 0.05 ? ' ⚠️ 不对齐' : ''
console.log(` [${i + 1}] 画面=${slotDur.toFixed(2)}s 音频=${audioDur.toFixed(2)}s 视频=${videoDur.toFixed(2)}s${stratInfo}${marker}`)
}
// -- 读取转场策略(在 addImages/addVideos 之前) -- // -- 读取转场策略(在 addImages/addVideos 之前) --
const transitionConfig = loadTransitions(manifest) const transitionConfig = loadTransitions(manifest)
console.log(`\nCapCut 成片组装`) console.log(`\nCapCut 成片组装`)
console.log(` 模式: ${mode} 画幅: ${format} (${width}x${height})`) console.log(` 模式: ${mode} 画幅: ${format} (${width}x${height})`)
console.log(` 时间线: ${hasTTS ? 'TTS音频驱动' : `固定${duration}s/段`} 总时长: ${(totalDurationUs / US).toFixed(1)}s`) console.log(` 时间线: ${hasTTS ? 'TTS音频驱动' : '视频原始时长'} 总时长: ${(totalDurationUs / US).toFixed(1)}s`)
console.log(` 字幕: ${subtitles} 配音: ${voiceover} 动画: ${animation}`) console.log(` 字幕: ${subtitles} 配音: ${voiceover} 动画: ${animation}`)
if (finalEffects) console.log(` 特效: ${finalEffects}`) if (finalEffects) console.log(` 特效: ${finalEffects}`)
if (finalFilter) console.log(` 滤镜: ${finalFilter}`) if (finalFilter) console.log(` 滤镜: ${finalFilter}`)
@@ -386,10 +484,10 @@ async function assemble(args) {
for (let i = 0; i < items.length; i++) { for (let i = 0; i < items.length; i++) {
const item = items[i] const item = items[i]
const tl = timeline[i] const tl = timeline[i]
if (tl.needAdjust && item.video) { if (tl.strategy && tl.strategy !== 'none' && item.video) {
const videoPath = path.resolve(inputDir, item.video) const videoPath = path.resolve(inputDir, item.video)
const audioDur = tl.duration / US const audioDur = tl.duration / US
const adjustedPath = await adjustVideoSpeed(videoPath, audioDur) const adjustedPath = await adjustVideoSpeed(videoPath, audioDur, tl.strategy, tl.speed, tl.freezeExtra || 0)
if (adjustedPath !== videoPath) { if (adjustedPath !== videoPath) {
item.video = path.relative(inputDir, adjustedPath) item.video = path.relative(inputDir, adjustedPath)
item.videoDuration = audioDur item.videoDuration = audioDur
@@ -398,7 +496,7 @@ async function assemble(args) {
} }
} }
if (adjustedCount > 0) { if (adjustedCount > 0) {
console.log(` 视频调: ${adjustedCount}/${items.length}`) console.log(` 视频调: ${adjustedCount}/${items.length}`)
} }
// Step 2: 上传(已调速的)视频到 OSS // Step 2: 上传(已调速的)视频到 OSS
@@ -547,7 +645,7 @@ async function assemble(args) {
console.log(` 草稿ID: ${draftId}`) console.log(` 草稿ID: ${draftId}`)
console.log(` 总时长: ${(totalDurationUs / US).toFixed(1)}s`) console.log(` 总时长: ${(totalDurationUs / US).toFixed(1)}s`)
console.log(` 素材数: ${items.length}`) console.log(` 素材数: ${items.length}`)
console.log(` 时间线: ${hasTTS ? 'TTS音频驱动' : '固定时长'}`) console.log(` 时间线: ${hasTTS ? 'TTS音频驱动' : '视频原始时长'}`)
if (mode === 'videos' && subtitles === 'false') { if (mode === 'videos' && subtitles === 'false') {
console.log(`\n >> 视频模式未加字幕,请在剪映中打开草稿 → 识别字幕 → 语音识别生成\n`) console.log(`\n >> 视频模式未加字幕,请在剪映中打开草稿 → 识别字幕 → 语音识别生成\n`)
} }
@@ -713,54 +811,142 @@ async function addKenBurns(draftUrl, segmentIds, items, timeline, manifest) {
// ============================================================================ // ============================================================================
/** /**
* ffmpeg 调速:将视频调整为指定时长 * ffmpeg 视频调整:根据策略适配音频时长
* ratio <= 2x: 加速ratio > 2x: 截断 *
* 返回调整后的文件路径(调整失败则返回原路径) * 策略(按 ratio = videoDur / audioDur 选择):
* speed_up (ratio > 1.1, ≤2x) → setpts 压缩时间(加速)
* trim (ratio > 2x) → 截断到目标时长
* slow_down (ratio < 0.9, ≥0.5x) → setpts 拉长时间(慢放)
* freeze (ratio < 0.5x) → 视频原速 + 最后一帧冻结补时长
* none (0.9~1.1) → 无需调整
*
* 所有策略失败后兜底:截断到目标时长
*
* 返回调整后的文件路径(失败则返回原路径)
*/ */
async function adjustVideoSpeed(videoPath, targetDurationSec) { async function adjustVideoSpeed(videoPath, targetDurationSec, strategy = 'none', speed = 1, freezeExtraUs = 0) {
if (!fs.existsSync(videoPath)) return videoPath if (!fs.existsSync(videoPath)) return videoPath
if (strategy === 'none') return videoPath
// 兜底截断:所有策略失败后的最终回退
function fallbackTrim(cb) {
execFile('ffmpeg', [
'-y', '-i', videoPath,
'-t', String(targetDurationSec),
'-c', 'copy',
videoPath.replace(/(\.\w+)$/, '_adj$1')
], { timeout: 30000 }, (err) => {
if (err) { cb(videoPath); return }
cb(videoPath.replace(/(\.\w+)$/, '_adj$1'))
})
}
return new Promise((resolve) => { return new Promise((resolve) => {
// 先获取视频时长
execFile('ffprobe', [ execFile('ffprobe', [
'-v', 'quiet', '-show_entries', 'format=duration', '-v', 'quiet', '-show_entries', 'format=duration',
'-of', 'csv=p=0', videoPath '-of', 'csv=p=0', videoPath
], (err, stdout) => { ], (err, stdout) => {
if (err) { resolve(videoPath); return } if (err) { fallbackTrim(resolve); return }
const videoDur = parseFloat(stdout.trim()) const videoDur = parseFloat(stdout.trim())
if (!videoDur || videoDur <= 0 || videoDur <= targetDurationSec + 0.1) { if (!videoDur || videoDur <= 0) { fallbackTrim(resolve); return }
resolve(videoPath); return
}
const ratio = videoDur / targetDurationSec
const outPath = videoPath.replace(/(\.\w+)$/, '_adj$1') const outPath = videoPath.replace(/(\.\w+)$/, '_adj$1')
if (ratio <= 2) { if (strategy === 'trim') {
// 加速setpts=PTS/speed, atempo=speed (音频变速)
const speed = ratio.toFixed(3)
const atempo = Math.min(speed, 2.0) // atempo 单次上限 2.0
execFile('ffmpeg', [
'-y', '-i', videoPath,
'-filter_complex', `setpts=PTS/${speed}`,
'-an',
outPath
], { timeout: 30000 }, (err) => {
if (err) { console.log(` 调速失败,使用原始视频: ${err.message}`); resolve(videoPath); return }
console.log(` 调速: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s (${speed}x)`)
resolve(outPath)
})
} else {
// 截断:取前 targetDuration 秒
execFile('ffmpeg', [ execFile('ffmpeg', [
'-y', '-i', videoPath, '-y', '-i', videoPath,
'-t', String(targetDurationSec), '-t', String(targetDurationSec),
'-c', 'copy', '-c', 'copy',
outPath outPath
], { timeout: 30000 }, (err) => { ], { timeout: 30000 }, (err) => {
if (err) { console.log(` 截断失败,使用原始视频: ${err.message}`); resolve(videoPath); return } if (err) { console.log(` 截断失败: ${err.message}`); resolve(videoPath); return }
console.log(` 截断: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s`) console.log(` 截断: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s`)
resolve(outPath) resolve(outPath)
}) })
} else if (strategy === 'speed_up') {
const speedVal = speed.toFixed(3)
execFile('ffmpeg', [
'-y', '-i', videoPath,
'-filter_complex', `setpts=PTS/${speedVal}`,
'-an',
outPath
], { timeout: 30000 }, (err) => {
if (err) {
console.log(` 加速失败,兜底截断: ${err.message}`)
fallbackTrim(resolve)
return
}
console.log(` 加速: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s (${speedVal}x)`)
resolve(outPath)
})
} else if (strategy === 'slow_down') {
const factor = (1 / speed).toFixed(3)
execFile('ffmpeg', [
'-y', '-i', videoPath,
'-filter_complex', `setpts=PTS*${factor}`,
'-an',
outPath
], { timeout: 30000 }, (err) => {
if (err) {
console.log(` 放缓失败,兜底截断: ${err.message}`)
fallbackTrim(resolve)
return
}
console.log(` 放缓: ${videoDur.toFixed(1)}s → ${targetDurationSec.toFixed(1)}s (${speed.toFixed(2)}x speed)`)
resolve(outPath)
})
} else if (strategy === 'freeze') {
// 画面停顿:原速播放 + 最后一帧冻结补时长
const freezeSec = freezeExtraUs / US
execFile('ffmpeg', [
'-y', '-i', videoPath,
'-filter_complex', `tpad=stop=-1:stop_duration=${freezeSec.toFixed(3)}`,
'-an',
outPath
], { timeout: 30000 }, (err) => {
if (err) {
// 回退方案:截取最后一帧 → 生成冻结帧视频 → concat 拼接
console.log(` tpad freeze 失败,尝试 concat 方案: ${err.message}`)
const lastFrame = videoPath.replace(/(\.\w+)$/, '_lastframe.png')
const frozenVideo = videoPath.replace(/(\.\w+)$/, '_frozen.mp4')
execFile('ffmpeg', [
'-y', '-sseof', '-0.1', '-i', videoPath,
'-frames:v', '1', lastFrame
], { timeout: 10000 }, (err2) => {
if (err2) { console.log(` concat 方案也失败,兜底截断`); fallbackTrim(resolve); return }
execFile('ffmpeg', [
'-y', '-loop', '1', '-i', lastFrame,
'-t', String(freezeSec.toFixed(3)),
'-pix_fmt', 'yuv420p',
'-vf', 'scale=trunc(iw/2)*2:trunc(ih/2)*2',
frozenVideo
], { timeout: 15000 }, (err3) => {
if (err3) {
try { fs.unlinkSync(lastFrame) } catch (_) {}
console.log(` 冻结帧视频生成失败,兜底截断`)
fallbackTrim(resolve)
return
}
const concatList = path.join(path.dirname(videoPath), '_freeze_concat.txt')
fs.writeFileSync(concatList, `file '${videoPath}'\nfile '${frozenVideo}'\n`)
execFile('ffmpeg', [
'-y', '-f', 'concat', '-safe', '0', '-i', concatList,
'-c', 'copy', outPath
], { timeout: 30000 }, (err4) => {
try { fs.unlinkSync(lastFrame); fs.unlinkSync(frozenVideo); fs.unlinkSync(concatList) } catch (_) {}
if (err4) { console.log(` 拼接失败,兜底截断`); fallbackTrim(resolve); return }
console.log(` 画面停顿: ${videoDur.toFixed(1)}s + 冻结 ${freezeSec.toFixed(1)}s = ${targetDurationSec.toFixed(1)}s`)
resolve(outPath)
})
})
})
return
}
console.log(` 画面停顿: ${videoDur.toFixed(1)}s + 冻结 ${freezeSec.toFixed(1)}s = ${targetDurationSec.toFixed(1)}s`)
resolve(outPath)
})
} else {
resolve(videoPath)
} }
}) })
}) })
@@ -829,8 +1015,8 @@ async function addVideos(draftUrl, inputDir, items, timeline, width, height, tra
async function batchUploadAudio(inputDir, items) { async function batchUploadAudio(inputDir, items) {
const urls = {} const urls = {}
for (const item of items) { for (const item of items) {
// 上传 segments 中的每段音频 // 上传所有 segment 音频文件
if (item.segments && item.segments.length > 1) { if (item.segments && item.segments.length > 0) {
for (const seg of item.segments) { for (const seg of item.segments) {
if (!seg.audio || seg.audio.startsWith('http') || urls[seg.audio]) continue if (!seg.audio || seg.audio.startsWith('http') || urls[seg.audio]) continue
const filePath = path.isAbsolute(seg.audio) const filePath = path.isAbsolute(seg.audio)
@@ -848,7 +1034,7 @@ async function batchUploadAudio(inputDir, items) {
} }
} }
} }
// 上传 item.audio单段或 segments 的第一段 // 上传 item.audio向后兼容,segments[0].audio 通常等于此值
if (!item.audio || item.audio.startsWith('http')) { if (!item.audio || item.audio.startsWith('http')) {
if (item.audio) urls[item.audio] = item.audio if (item.audio) urls[item.audio] = item.audio
continue continue
@@ -893,24 +1079,29 @@ async function addVoiceover(draftUrl, inputDir, items, timeline, audioUrls = {})
for (let i = 0; i < items.length; i++) { for (let i = 0; i < items.length; i++) {
const item = items[i] const item = items[i]
const tl = timeline[i] const tl = timeline[i]
const segments = item.segments && item.segments.length > 1 ? item.segments : null
if (segments) { if (item.segments && item.segments.length > 0) {
// 多段音频:按 segment 逐段添加,使用精确时长 // 逐段添加,每段使用实际音频文件时长(不做比例分配,消除留白)
const slots = distributeSegments(tl, segments) let currentTime = tl.start
for (let si = 0; si < item.segments.length; si++) {
for (const slot of slots) { const seg = item.segments[si]
const audioUrl = resolveAudio(slot.audio) const audioUrl = resolveAudio(seg.audio)
const segDurUs = (seg.duration || 0) * US
if (segDurUs <= 0) continue
// 最后一段对齐 timeline 末尾,吃掉浮点误差
const isLast = si === item.segments.length - 1
const endTime = isLast ? tl.end : currentTime + segDurUs
audioInfos.push({ audioInfos.push({
audio_url: audioUrl, audio_url: audioUrl,
start: slot.start, start: currentTime,
end: slot.end, end: endTime,
duration: slot.duration, duration: endTime - currentTime,
volume: 1.0, volume: 1.0,
}) })
currentTime = endTime
} }
} else if (item.audio) { } else if (item.audio) {
// 单段音频:用实际音频时长,不超过 timeline 时长 // 无 segments用实际音频时长
const audioUrl = resolveAudio(item.audio) const audioUrl = resolveAudio(item.audio)
const audioDurUs = item.audioDuration ? item.audioDuration * US : tl.duration const audioDurUs = item.audioDuration ? item.audioDuration * US : tl.duration
@@ -981,23 +1172,6 @@ function applyAnimationProps(cap, style = {}) {
if (style.outAnimDuration) cap.out_animation_duration = style.outAnimDuration if (style.outAnimDuration) cap.out_animation_duration = style.outAnimDuration
} }
// segments 按比例分配到时间线DRY helper
function distributeSegments(tl, segments) {
const totalSegDur = segments.reduce((sum, s) => sum + (s.duration || 0) * US, 0)
if (totalSegDur <= 0) return []
const tlDuration = tl.end - tl.start
let currentTime = tl.start
return segments.map((seg, idx) => {
const segDurUs = Math.round((seg.duration || 0) * US)
let duration = Math.round(tlDuration * (segDurUs / totalSegDur))
if (idx === segments.length - 1) duration = tl.end - currentTime
duration = Math.max(duration, 100000)
const entry = { start: currentTime, end: currentTime + duration, duration, text: seg.text, audio: seg.audio }
currentTime += duration
return entry
})
}
function loadAccountConfig(manifest) { function loadAccountConfig(manifest) {
const account = manifest.account const account = manifest.account
if (!account) return {} if (!account) return {}
@@ -1093,17 +1267,19 @@ async function addSubtitles(draftUrl, items, timeline, style = {}, split = false
const tl = timeline[i] const tl = timeline[i]
if (split) { if (split) {
// 分句模式:优先用 segmentsTTS 逐句生成的精确时长),回退到字数估算 // 分句模式:优先用 segments 精确时长(与 addVoiceover 同步),回退到字数估算
const segments = item.segments && item.segments.length > 1 ? item.segments : null if (item.segments && item.segments.length > 0) {
let currentTime = tl.start
if (segments) { for (let si = 0; si < item.segments.length; si++) {
// 精确模式:用 segments 的实际音频时长 const seg = item.segments[si]
const slots = distributeSegments(tl, segments) const segDurUs = (seg.duration || 0) * US
if (segDurUs <= 0) continue
for (const slot of slots) { const isLast = si === item.segments.length - 1
const cap = { start: slot.start, end: slot.end, text: slot.text } const endTime = isLast ? tl.end : currentTime + segDurUs
const cap = { start: currentTime, end: endTime, text: seg.text }
applyAnimationProps(cap, animStyle) applyAnimationProps(cap, animStyle)
captions.push(cap) captions.push(cap)
currentTime = endTime
} }
} else { } else {
// 回退:字数权重估算 // 回退:字数权重估算
@@ -1246,7 +1422,6 @@ async function main() {
console.log('选项:') console.log('选项:')
console.log(' --mode images|videos 素材类型(默认 images') console.log(' --mode images|videos 素材类型(默认 images')
console.log(' --format 9:16 画幅比例') console.log(' --format 9:16 画幅比例')
console.log(' --duration 4 默认每段时长/秒无TTS时的fallback默认 4')
console.log(' --voiceover true|false 是否添加TTS配音轨道默认 true') console.log(' --voiceover true|false 是否添加TTS配音轨道默认 true')
console.log(' --subtitles true|false 是否添加字幕(默认 true') console.log(' --subtitles true|false 是否添加字幕(默认 true')
console.log(' --split-captions true|false 分句字幕模式(默认 true按标点切分') console.log(' --split-captions true|false 分句字幕模式(默认 true按标点切分')
@@ -1256,12 +1431,12 @@ async function main() {
console.log(' --apiKey <key> 云渲染 API Key可选') console.log(' --apiKey <key> 云渲染 API Key可选')
console.log(' --manifest <path> manifest.json 路径') console.log(' --manifest <path> manifest.json 路径')
console.log('') console.log('')
console.log('时间线模式:') console.log('时间线规则:')
console.log(' manifest.json 中每段包含 audio + duration → TTS音频驱动时间线') console.log(' 图片模式: TTS 音频时长 = 画面时长,无音频则跳过')
console.log(' 无 audio/duration → 按 --duration 固定时长') console.log(' 视频模式: TTS 为主轴,视频通过以下策略适配:')
console.log('') console.log(' 视频比音频长 → 加速(≤2x) 或 裁剪(>2x)')
console.log('manifest.json 示例TTS驱动:') console.log(' 视频比音频短 → 放缓(≥0.5x) 或 画面停顿(<0.5x)')
console.log(' {"items":[{"file":"1.png","text":"文案","audio":"seg_1.mp3","duration":3.5}]}') console.log(' 所有策略失败 → 兜底截断')
console.log('') console.log('')
console.log('配置:') console.log('配置:')
console.log(' 请运行 node setup.js 生成配置') console.log(' 请运行 node setup.js 生成配置')

View File

@@ -5,21 +5,26 @@
const { loadManifest, saveManifest } = require('./pipeline-utils') const { loadManifest, saveManifest } = require('./pipeline-utils')
function confirmManifest(options) { function confirmManifest(options) {
const { manifest: manifestPath, all } = options const { manifest: manifestPath, all, items: itemsStr } = options
if (!manifestPath) { if (!manifestPath) {
console.error('用法: pipeline.js confirm --manifest <path> --all') console.error('用法: pipeline.js confirm --manifest <path> --all')
console.error(' pipeline.js confirm --manifest <path> --items 1,3,5')
process.exit(1) process.exit(1)
} }
if (!all) { if (!all && !itemsStr) {
console.error('错误: 必须指定 --all') console.error('错误: 必须指定 --all 或 --items <id列表>')
process.exit(1) process.exit(1)
} }
const manifest = loadManifest(manifestPath) const manifest = loadManifest(manifestPath)
const targetIds = itemsStr
? new Set(itemsStr.split(',').map(s => parseInt(s.trim(), 10)).filter(n => !isNaN(n)))
: null
let count = 0 let count = 0
for (const item of manifest.items) { for (const item of manifest.items) {
if (targetIds && !targetIds.has(item.id)) continue
if (item.file && item.status === 'done' && !item.confirmed) { if (item.file && item.status === 'done' && !item.confirmed) {
item.confirmed = true item.confirmed = true
count++ count++
@@ -30,7 +35,8 @@ function confirmManifest(options) {
const total = manifest.items.length const total = manifest.items.length
const confirmed = manifest.items.filter(it => it.confirmed).length const confirmed = manifest.items.filter(it => it.confirmed).length
console.log(`已确认: ${count} items${confirmed}/${total} 已确认)`) const scope = targetIds ? `${Array.from(targetIds).join(',')}` : '全部'
console.log(`已确认: ${count} items范围: ${scope},共 ${confirmed}/${total} 已确认)`)
} }
module.exports = { confirmManifest } module.exports = { confirmManifest }

View File

@@ -6,7 +6,7 @@
const fs = require('fs') const fs = require('fs')
const path = require('path') const path = require('path')
const { loadAccountConfig, saveManifest, ensureDir, ACCOUNTS_DIR, SKILLS_DIR } = require('./pipeline-utils') const { loadAccountConfig, saveManifest, ensureDir, slugify, ACCOUNTS_DIR, SKILLS_DIR } = require('./pipeline-utils')
function initManifest(options) { function initManifest(options) {
const { account: accountId, mode, items: itemsJson, itemsFile } = options const { account: accountId, mode, items: itemsJson, itemsFile } = options
@@ -40,7 +40,8 @@ function initManifest(options) {
} }
// 校验必填字段 // 校验必填字段
const requiredFields = ['shotDesc', 'script', 'imagePrompt'] const requiredFields = ['shotDesc', 'script']
const optionalFields = ['imagePrompt', 'videoPrompt', 'lastFramePrompt']
const resolvedMode = mode || 'single' const resolvedMode = mode || 'single'
for (let i = 0; i < rawItems.length; i++) { for (let i = 0; i < rawItems.length; i++) {
@@ -52,8 +53,7 @@ function initManifest(options) {
} }
} }
if (resolvedMode === 'framePair' && !item.lastFramePrompt) { if (resolvedMode === 'framePair' && !item.lastFramePrompt) {
console.error(`错误: 首尾帧模式 items[${i}] 缺少 "lastFramePrompt"imagePrompt 作为第一帧)`) delete item.lastFramePrompt // 首尾帧模式 Step 2-A 补充
process.exit(1)
} }
} }
@@ -68,9 +68,11 @@ function initManifest(options) {
// 构建 items // 构建 items
const items = rawItems.map((raw, i) => { const items = rawItems.map((raw, i) => {
const slug = slugify(raw.shotDesc || raw.script || `scene_${i + 1}`)
const item = { const item = {
id: i + 1, id: i + 1,
status: 'pending', status: 'pending',
file: `images/scene_${String(i + 1).padStart(2, '0')}_${slug}.jpeg`,
shotDesc: raw.shotDesc || '', shotDesc: raw.shotDesc || '',
script: raw.script || '', script: raw.script || '',
duration: raw.duration || 5, duration: raw.duration || 5,
@@ -129,7 +131,13 @@ function initManifest(options) {
console.log(` 画幅: ${manifest.format}, 模式: ${manifest.mode}`) console.log(` 画幅: ${manifest.format}, 模式: ${manifest.mode}`)
console.log(` Items: ${items.length}`) console.log(` Items: ${items.length}`)
console.log(` 参考图: ${references.length}`) console.log(` 参考图: ${references.length}`)
if (items.some(it => !it.videoPrompt)) { if (items.some(it => !it.imagePrompt)) {
console.log(`${items.filter(it => !it.imagePrompt).length} 个 item 缺少 imagePrompt请运行 Step 2-A图片提示词补充`)
}
if (resolvedMode === 'framePair' && items.some(it => !it.lastFramePrompt)) {
console.log(`${items.filter(it => !it.lastFramePrompt).length} 个 item 缺少 lastFramePrompt请运行 Step 2-A 补充`)
}
if (items.some(it => !it.videoPrompt && resolvedMode !== 'framePair')) {
console.log(`${items.filter(it => !it.videoPrompt).length} 个 item 缺少 videoPrompt生视频阶段将跳过`) console.log(`${items.filter(it => !it.videoPrompt).length} 个 item 缺少 videoPrompt生视频阶段将跳过`)
} }
console.log() console.log()

View File

@@ -41,6 +41,9 @@ function validateManifest(manifestPath) {
if (item.status && !['pending', 'generating', 'done', 'failed'].includes(item.status)) { if (item.status && !['pending', 'generating', 'done', 'failed'].includes(item.status)) {
issues.push(`${prefix} status 无效: ${item.status}`) issues.push(`${prefix} status 无效: ${item.status}`)
} }
if (item.status === 'done' && !item.file && !item.video && !item.url) {
issues.push(`${prefix} status=done 但缺少 file/video/url素材路径`)
}
}) })
} }

View File

@@ -15,6 +15,14 @@ async function phaseAssemble(manifest, manifestPath, options) {
const hasVideos = videoItems.length > 0 const hasVideos = videoItems.length > 0
const mode = hasVideos ? 'videos' : 'images' const mode = hasVideos ? 'videos' : 'images'
// 前置校验:图片模式下检查 file 字段
if (mode === 'images') {
const missingFile = manifest.items.filter(it => !it.file)
if (missingFile.length > 0) {
throw new Error(`${missingFile.length} 个 item 缺少 file 字段id: ${missingFile.map(it => it.id).join(', ')}),请先运行 images 阶段生成图片`)
}
}
const assembleArgs = { const assembleArgs = {
input: dir, input: dir,
manifest: manifestPath, manifest: manifestPath,
@@ -22,7 +30,6 @@ async function phaseAssemble(manifest, manifestPath, options) {
format: manifest.format || accountConfig.defaultFormat || '9:16', format: manifest.format || accountConfig.defaultFormat || '9:16',
subtitles: mode === 'images' ? 'true' : 'false', subtitles: mode === 'images' ? 'true' : 'false',
voiceover: manifest.items.some(it => it.audio) ? 'true' : 'false', voiceover: manifest.items.some(it => it.audio) ? 'true' : 'false',
duration: '4',
animation: capcutConfig.animation || '渐显+放大', animation: capcutConfig.animation || '渐显+放大',
} }

View File

@@ -17,7 +17,8 @@ async function phaseImages(manifest, manifestPath, options) {
ensureDir(imagesDir) ensureDir(imagesDir)
const items = manifest.items.filter(it => const items = manifest.items.filter(it =>
(!it.status || it.status === 'pending' || it.status === 'generating') && it.imagePrompt ((!it.status || it.status === 'pending' || it.status === 'generating') && it.imagePrompt) ||
(it.status === 'done' && manifest.mode === 'framePair' && it.file && it.lastFramePrompt && !it.lastFrame)
) )
if (items.length === 0) { log('images', '无待处理 item跳过'); return } if (items.length === 0) { log('images', '无待处理 item跳过'); return }
@@ -45,6 +46,14 @@ async function phaseImages(manifest, manifestPath, options) {
item.status = 'generating' item.status = 'generating'
saveManifest(manifestPath, manifest) saveManifest(manifestPath, manifest)
// 仅补 lastFrame首帧已存在跳过首帧生成
if (item.file && manifest.mode === 'framePair' && item.lastFramePrompt && !item.lastFrame) {
log('images', `[${idx}] 补生成 lastFrame首帧已有: ${item.file}`)
await generateLastFrame(item, idx, manifest, dir, imagesDir, model, ratio, manifestPath)
saveManifest(manifestPath, manifest)
return { ok: true }
}
let result let result
if (model === 'gemini') { if (model === 'gemini') {
result = await generateGemini(item, idx, dir, imagesDir, ratio, refs) result = await generateGemini(item, idx, dir, imagesDir, ratio, refs)

View File

@@ -2,7 +2,8 @@
* Phase: tts — 语音合成(逐句分句生成) * Phase: tts — 语音合成(逐句分句生成)
* *
* 将每个 item 的 script 按标点切分为短句,每句单独生成 TTS 音频。 * 将每个 item 的 script 按标点切分为短句,每句单独生成 TTS 音频。
* 结果写入 item.segments[]实现字幕与语音精确对齐 * 统一写入 item.segments[]单句时数组仅 1 个元素
* item.audio 指向第一段item.audioDuration 为累计时长。
*/ */
const path = require('path') const path = require('path')
@@ -29,47 +30,32 @@ async function phaseTts(manifest, manifestPath, options = {}) {
try { try {
const sentences = splitTextIntoSentences(fullText) const sentences = splitTextIntoSentences(fullText)
const segments = []
let totalDuration = 0
if (sentences.length <= 1) { for (let j = 0; j < sentences.length; j++) {
// 单句:不需要 segments走原逻辑 const sentence = sentences[j]
const { filePath, duration } = await synthesize(fullText, { const segId = `${item.id || idx}_${j + 1}`
const { filePath, duration } = await synthesize(sentence, {
outputDir: audioDir, outputDir: audioDir,
id: item.id || idx, id: segId,
voice: manifest.ttsVoice || undefined, voice: manifest.ttsVoice || undefined,
instruction: manifest.ttsInstruction || undefined, instruction: manifest.ttsInstruction || undefined,
rate: manifest.ttsRate || undefined, rate: manifest.ttsRate || undefined,
}) })
item.audio = path.relative(dir, filePath).replace(/\\/g, '/') segments.push({
item.audioDuration = Math.round(duration * 1000) / 1000 text: sentence,
log('tts', `[${idx}/${items.length}] ${duration.toFixed(1)}s: ${fullText.substring(0, 30)}...`) audio: path.relative(dir, filePath).replace(/\\/g, '/'),
} else { duration: Math.round(duration * 1000) / 1000,
// 多句:逐句生成,写入 segments })
const segments = [] totalDuration += duration
let totalDuration = 0
for (let j = 0; j < sentences.length; j++) {
const sentence = sentences[j]
const segId = `${item.id || idx}_${j + 1}`
const { filePath, duration } = await synthesize(sentence, {
outputDir: audioDir,
id: segId,
voice: manifest.ttsVoice || undefined,
instruction: manifest.ttsInstruction || undefined,
rate: manifest.ttsRate || undefined,
})
segments.push({
text: sentence,
audio: path.relative(dir, filePath).replace(/\\/g, '/'),
duration: Math.round(duration * 1000) / 1000,
})
totalDuration += duration
}
item.segments = segments
item.audio = segments[0].audio
item.audioDuration = Math.round(totalDuration * 1000) / 1000
log('tts', `[${idx}/${items.length}] ${totalDuration.toFixed(1)}s (${segments.length}句): ${fullText.substring(0, 30)}...`)
} }
// 统一使用 segments 数组(单句 = 1 元素,多句 = N 元素)
item.segments = segments
item.audio = segments[0].audio
item.audioDuration = Math.round(totalDuration * 1000) / 1000
log('tts', `[${idx}/${items.length}] ${totalDuration.toFixed(1)}s (${segments.length}句): ${fullText.substring(0, 30)}...`)
} catch (err) { } catch (err) {
item.status = 'failed' item.status = 'failed'
item.error = `TTS失败: ${err.message}` item.error = `TTS失败: ${err.message}`

View File

@@ -1,7 +1,7 @@
/** /**
* Phase: upload — OSS 上传 * Phase: upload — OSS 上传
* *
* 将生成的图片(含首尾帧)上传到 OSS回写 url * 将图片(含首尾帧)和视频上传到 OSS回写 url / videoUrl
*/ */
const path = require('path') const path = require('path')
@@ -11,35 +11,64 @@ async function phaseUpload(manifest, manifestPath) {
const dir = getManifestDir(manifestPath) const dir = getManifestDir(manifestPath)
const { uploadFile } = require('../oss-upload') const { uploadFile } = require('../oss-upload')
const items = manifest.items.filter(it => // 图片(含首尾帧 first frame
const imageItems = manifest.items.filter(it =>
it.status === 'done' && it.file && !it.url it.status === 'done' && it.file && !it.url
) )
if (items.length === 0) { log('upload', '无待上传 item跳过'); return } // 视频
const videoItems = manifest.items.filter(it =>
it.status === 'done' && it.video && !it.videoUrl
)
log('upload', `${items.length} 个文件`) if (imageItems.length === 0 && videoItems.length === 0) {
log('upload', '无待上传文件,跳过')
return
}
for (let i = 0; i < items.length; i++) { // 上传图片
const item = items[i] if (imageItems.length > 0) {
const filePath = path.resolve(dir, item.file) log('upload', `图片: ${imageItems.length}`)
try { for (let i = 0; i < imageItems.length; i++) {
const { url } = await uploadFile(filePath) const item = imageItems[i]
item.url = url const filePath = path.resolve(dir, item.file)
log('upload', `[${i + 1}/${items.length}] ${item.file}${url.substring(0, 60)}...`)
} catch (err) {
item.error = `上传失败: ${err.message}`
log('upload', `[${i + 1}/${items.length}] 失败: ${err.message}`)
}
if (item.url && item.lastFrame && !item.lastFrameUrl) {
const lastPath = path.resolve(dir, item.lastFrame)
try { try {
const { url } = await uploadFile(lastPath) const { url } = await uploadFile(filePath)
item.lastFrameUrl = url item.url = url
log('upload', `[${i + 1}/${items.length}] lastFrame → OK`) log('upload', ` [${i + 1}/${imageItems.length}] ${item.file} → OK`)
} catch (err) { } catch (err) {
log('upload', `[${i + 1}/${items.length}] lastFrame 上传失败: ${err.message}`) item.error = `上传失败: ${err.message}`
log('upload', ` [${i + 1}/${imageItems.length}] 失败: ${err.message}`)
} }
// 首尾帧模式:上传 lastFrame
if (item.url && item.lastFrame && !item.lastFrameUrl) {
const lastPath = path.resolve(dir, item.lastFrame)
try {
const { url } = await uploadFile(lastPath)
item.lastFrameUrl = url
log('upload', ` [${i + 1}/${imageItems.length}] lastFrame → OK`)
} catch (err) {
log('upload', ` [${i + 1}/${imageItems.length}] lastFrame 上传失败: ${err.message}`)
}
}
saveManifest(manifestPath, manifest)
}
}
// 上传视频
if (videoItems.length > 0) {
log('upload', `视频: ${videoItems.length}`)
for (let i = 0; i < videoItems.length; i++) {
const item = videoItems[i]
const videoPath = path.resolve(dir, item.video)
try {
const { url } = await uploadFile(videoPath)
item.videoUrl = url
log('upload', ` [${i + 1}/${videoItems.length}] ${item.video} → OK`)
} catch (err) {
log('upload', ` [${i + 1}/${videoItems.length}] 失败: ${err.message}`)
}
saveManifest(manifestPath, manifest)
} }
saveManifest(manifestPath, manifest)
} }
} }

View File

@@ -112,13 +112,23 @@ function applyRetryFailed(manifest, phases) {
for (const item of manifest.items) { for (const item of manifest.items) {
if (item.status === 'failed' || item.status === 'partial') { if (item.status === 'failed' || item.status === 'partial') {
if (item.url && item.videoPrompt && !item.video) { if (item.url && item.videoPrompt && !item.video) {
// 图片已上传但视频未生成 → 直接重试视频阶段
item.status = 'done' item.status = 'done'
item.error = '' item.error = ''
resetCount++ resetCount++
} else if (!item.url && item.imagePrompt) { } else if (!item.url && item.imagePrompt) {
item.status = 'pending' // 图片未上传 → 重试图片阶段
item.error = '' // 如果首帧已存在但 lastFrame 失败,只重置 lastFrame 相关
resetCount++ if (item.file && manifest.mode === 'framePair' && !item.lastFrame) {
item.status = 'done' // 保留首帧,只补 lastFrame
item.error = ''
resetCount++
} else {
item.status = 'pending'
item.error = ''
delete item.file // 清除旧文件引用,避免重复
resetCount++
}
} }
} }
} }
@@ -128,7 +138,7 @@ function applyRetryFailed(manifest, phases) {
} }
} }
if (phases.includes('images')) { if (phases.includes('images')) {
if (manifest.items.some(it => !it.status || it.status === 'pending')) { if (manifest.items.some(it => (!it.status || it.status === 'pending') || (it.status === 'done' && manifest.mode === 'framePair' && !it.lastFrame))) {
manifest.pipeline.phases.images = 'pending' manifest.pipeline.phases.images = 'pending'
} }
} }
@@ -159,7 +169,6 @@ function parseArgs(argv) {
else if (argv[i] === '--image-model' && argv[i + 1]) args.imageModel = argv[++i] else if (argv[i] === '--image-model' && argv[i + 1]) args.imageModel = argv[++i]
else if (argv[i] === '--video-model' && argv[i + 1]) args.videoModel = argv[++i] else if (argv[i] === '--video-model' && argv[i + 1]) args.videoModel = argv[++i]
else if (argv[i] === '--references' && argv[i + 1]) args.references = argv[++i] else if (argv[i] === '--references' && argv[i + 1]) args.references = argv[++i]
else if (argv[i] === '--style' && argv[i + 1]) args.style = argv[++i]
else if (argv[i] === '--all') args.all = true else if (argv[i] === '--all') args.all = true
else if (!args.command) args.command = argv[i] else if (!args.command) args.command = argv[i]
} }
@@ -219,6 +228,7 @@ async function main() {
console.log(' pipeline.js init --account <id> --mode <single|framePair> --items <JSON> [--items-file <path>] [--image-model gemini|mj] [--video-model veo3-fast|grok|kling] [--format 9:16]') console.log(' pipeline.js init --account <id> --mode <single|framePair> --items <JSON> [--items-file <path>] [--image-model gemini|mj] [--video-model veo3-fast|grok|kling] [--format 9:16]')
console.log(' pipeline.js validate --manifest <path>') console.log(' pipeline.js validate --manifest <path>')
console.log(' pipeline.js confirm --manifest <path> --all') console.log(' pipeline.js confirm --manifest <path> --all')
console.log(' pipeline.js confirm --manifest <path> --items 1,3,5')
console.log(' pipeline.js run --manifest <path> [--account id] [--phase p1,p2] [--resume] [--retry-failed]') console.log(' pipeline.js run --manifest <path> [--account id] [--phase p1,p2] [--resume] [--retry-failed]')
console.log(' pipeline.js status --manifest <path>') console.log(' pipeline.js status --manifest <path>')
console.log('') console.log('')

2
.gitignore vendored
View File

@@ -2,7 +2,7 @@
node_modules/ node_modules/
config.json
# Local settings # Local settings
.claude/settings.local.json .claude/settings.local.json

View File

@@ -2,9 +2,7 @@
## 一、角色定义 ## 一、角色定义
你是一位专精图片生成模型的提示词工程师,具备深厚的视觉叙事和光影设计能力 你是一位拥有 15 年经验的电影摄影指导DP擅长将文字分镜转化为高表现力的视觉起始帧。你不仅关注“画了什么”更关注“空间叙述”与“光影秩序”
你的唯一任务是将输入的分镜描述shotDesc作为核心内容依据结合旁白语义、文案上下文以及上游指定的导演风格生成一条可直接送给图片生成模型的完整 imagePrompt。
> **重要前提:** 你生成的图片是下游视频片段的起始帧。构图和姿态必须是「即将发生」的瞬间,而非「已完成」的状态。 > **重要前提:** 你生成的图片是下游视频片段的起始帧。构图和姿态必须是「即将发生」的瞬间,而非「已完成」的状态。