Files

sion123 e4723d9ce3 feat(video-pipeline): 添加关键字氛围词花字叠加和Oss URL回写功能

- 新增 Q17 关键字氛围词问答项，支持关闭/默认/自定义花字效果
- 在 manifest 和 account.json 中添加 keyword 和 keywordStyle 字段
- 实现关键字氛围词在画面中央的叠加逻辑，支持动画、阴影、文字效果等配置
- 在 assemble 流程中增加 `keywords` 步骤，自动读取账号配置生成花字
- 修复音频上传后未回写 OSS URL 到 manifest 的问题，避免重复上传

2026-05-01 15:21:59 +08:00

7.7 KiB

Raw Blame History

manifest.json 规范

pipeline.js init 创建，Pipeline 执行，Agent 审查。

禁止 AI 手写 manifest.json，必须通过 pipeline.js init 初始化。脚本从 account.json 自动继承结构字段，AI 只提供创意内容（items 的 shotDesc/script/imagePrompt 等）。

创建方式

# Step 2-A 生成 imagePrompt 后，通过脚本初始化（不含 videoPrompt）
node scripts/pipeline.js init --account 军事账号 --mode single \
  --items '[{"shotDesc":"英文画面描述","script":"中文口播文案","duration":5,"imagePrompt":"English prompt","directorRef":"tarantino"}]'

# 或从文件读取
node scripts/pipeline.js init --account 军事账号 --mode single --items-file ./items.json

# Step 2-C 人工确认
node scripts/pipeline.js confirm --manifest <path> --all
node scripts/pipeline.js confirm --manifest <path> --items 1,3,5

# 校验已有 manifest
node scripts/pipeline.js validate --manifest <path>

顶层字段

字段	说明	来源	谁填充
`account`	账号 ID	account.json	init 自动
`imageModel`	`gemini` / `mj`	account.json	init 自动
`videoModel`	`veo3-fast-frames` / `grok-video-3` / `kling` 等	account.json	init 自动
`format`	画幅：`9:16` / `16:9`	account.json	init 自动
`mode`	`single` 单图 / `framePair` 首尾帧	CLI 参数	init 自动
`references`	参考图数组，从 account.json styles.*.references 搬入	account.json	init 自动
`items`	素材数组（AI 提供创意内容）	CLI --items	AI → init

references 字段

从 account.json 搬入，pipeline 直接使用，不再回读 account.json。

Gemini → 读 file（本地路径，图生图用）
MJ → 读 url（公网 URL，--sref 用）

items[] 字段

Agent 写入（创建时）

字段	说明
`status`	固定写 `"pending"`
`shotDesc`	英文分镜描述（含隐性动势，40-80词）
`script`	该段的完整原文案（不提炼，保留论证、例子、细节）
`duration`	计划视频时长（秒），来自分镜阶段
`imagePrompt`	英文画面描述（给 Gemini/MJ），Step 2-A 生成
`directorRef`	导演构图参考（tarantino / kitano / fincher），三层透传
`keyword`	关键字氛围词（2-6 字），assemble 时以花字效果叠加在画面中央。可选
`confirmed`	人工确认状态，默认 `false`

Agent 后续回写（Step 3-A 视频提示词）

字段	说明	写入时机
`videoPrompt`	英文运动描述（给 Grok/VEO），描述镜头运动而非内容	Step 3-A 由 Agent 回写

Pipeline 回写（执行后）

字段	说明	写入阶段
`status`	`pending` → `generating` → `done` / `failed`	images
`file`	生成的图片路径（相对 manifest）	images
`candidates`	MJ 拆分的 4 张候选图路径（Gemini 无此字段）	images
`url`	图片 OSS 公网 URL	upload
`confirmed`	人工确认后设为 `true`	confirm
`video`	生成的视频路径	videos
`videoDuration`	视频时长（秒），Grok=6, VEO=8	videos
`videoUrl`	视频 OSS 公网 URL	videos
`audio`	TTS 音频路径（多句时为合并后的完整音频）	tts
`audioDuration`	音频时长（秒）	tts
`segments`	分句音频数组（仅多句时存在），见下方	tts

Agent 审查时可操作

MJ 换选：item.file = item.candidates[2]
删除不合格 item：直接从 items 数组移除，重新跑 --phase images
调整 prompt 重跑：改 imagePrompt，status 改回 pending
人工确认：node scripts/pipeline.js confirm --manifest <path> --all

状态机

item 生命周期

pending → [images] → done → [confirm] → confirmed=true → [upload: url填入] → [videos] → done → [tts] → done
             ↓                                          ↓
          failed                                    failed + error

status 一旦进入 done 就不再回退。后续阶段通过检查"有前置字段 + 无后置字段"来识别待处理 item，不依赖 status 变化。

各阶段拾取条件

Agent 不需要记住这些条件，pipeline 内部自动匹配。仅供理解原理：

阶段	item 被拾取的条件
images	`status=pending` + 有 `imagePrompt`
upload	`status=done` + 有 `file` + 无 `url`
videos	`status=done` + `confirmed=true` + 有 `url` + 有 `videoPrompt` + 无 `video`
tts	`status=done` + 有 `script`（回退 `text`） + 无 `audio`

pipeline.phases 整体状态

每个阶段有独立状态：pending → running → done / partial / failed

done — 全部 item 成功
partial — 部分 item 失败（其他成功）
failed — 阶段整体异常中断

失败处理

用 --retry-failed 一条命令搞定。

根据失败阶段选择操作

图片生成失败（images 阶段 partial）：

# 只改 prompt 不改图片风格 → 重试即可
node scripts/pipeline.js run --manifest <path> --phase images --retry-failed

# 需要换 prompt → 先改 item.imagePrompt，再重试
# （改完后跑上面同一条命令）

视频生成失败（videos 阶段 partial）：

# API 临时故障、网络超时 → 直接重试
node scripts/pipeline.js run --manifest <path> --phase videos --retry-failed

# 提示词问题 → 先改 item.videoPrompt，再重试
# （改完后跑上面同一条命令）

# 视频模型不可用 → 改 manifest.videoModel 或 account.json，再重试

全阶段重试：

node scripts/pipeline.js run --manifest <path> --retry-failed

`--retry-failed` 内部行为

扫描所有 status=failed 或 status=partial 的 item
根据已有字段自动判断应重置到哪个阶段：
- 有 url + videoPrompt + 无 video → 重置为可生视频（status=done）
- 无 url + 有 imagePrompt → 重置为可生图（status=pending）
对应 pipeline.phases 重置为 pending
清除 error 字段
正常执行指定阶段

首尾帧模式

mode: "framePair" 时，imagePrompt 作为起始帧，每个 item 额外字段：

字段	说明	谁填充
`imagePrompt`	起始帧画面描述（与 single 模式复用同一字段）	AI
`lastFramePrompt`	结束帧画面描述	AI
`lastFrame`	结束帧图片路径	pipeline images 回写
`lastFrameUrl`	结束帧 OSS URL	pipeline upload 回写

首尾帧规则：同一场景、视角一致、状态对比。VEO 检测到 lastFrameUrl 自动启用双图模式。

目录结构

output/{account}_{YYYYMMDD}_{NNN}/
├── manifest.json       # 主清单
├── images/             # scene_{NN}_{slug}.jpeg（首尾帧加 _last，MJ 候选加 _cand{1-4}）
├── videos/             # scene_{NN}_{slug}.mp4
└── audio/              # seg_001.mp3

slug 从 shotDesc 派生（slugify: 保留中文和字母数字，最多 20 字符）。

segments[] 字段（TTS 分句）

TTS 阶段自动生成。仅当 script 被切分为 2 句及以上时才写入。单句时不写 segments。

字段	说明
`text`	分句文本（已去除标点）
`audio`	该句音频路径（相对 manifest）
`duration`	该句音频时长（秒）

item.audio 指向所有分段合并后的完整音频，item.audioDuration 为各段累计时长。assemble 阶段优先用 segments 的精确时长对齐字幕，无 segments 时回退到字数权重估算。

7.7 KiB Raw Blame History Unescape Escape