feat(skills): 集成 GPT Image 图片生成和编辑能力

- 新增 gpt-image-generator.js 脚本,支持文生图、图生图/重绘、批量生成
- 更新 pipeline 和 phase-images 支持 GPT Image 模型
- 更新技能文档,添加 GPT Image 使用说明和 API 特点
- 新增配置文件中的 GPT Image API 参数
This commit is contained in:
2026-05-05 23:49:30 +08:00
parent 823519cbf7
commit 35488beef2
8 changed files with 752 additions and 11 deletions

View File

@@ -13,6 +13,9 @@
"grokModel": "grok-video-3",
"veoApiBaseUrl": "https://yunwu.ai",
"veoApiKey": "sk-m5inhwXqrbcBL6NNKOe7kTdhX8M31azvAvDvtSPGS71rRzd8",
"gptImageApiBaseUrl": "https://yunwu.ai",
"gptImageApiKey": "sk-m5inhwXqrbcBL6NNKOe7kTdhX8M31azvAvDvtSPGS71rRzd8",
"gptImageModel": "gpt-image-2",
"veoModel": "veo3-fast-frames",
"veoEnhancePrompt": true,
"veoEnableUpsample": true,

View File

@@ -1,11 +1,11 @@
---
name: image-generator
description: 图片生成技能。支持 GeminiMidjourney (MJ)个模型。批量生图、图生图、风格转换、4合1自动拆分。触发词生图、生成图片、批量出图、图片素材、MJ生图、Gemini生图、图生图、风格转换。
description: 图片生成技能。支持 GeminiMidjourney (MJ)、GPT Image 三个模型。批量生图、图生图、风格转换、4合1自动拆分、重绘/编辑。触发词生图、生成图片、批量出图、图片素材、MJ生图、Gemini生图、GPT Image生图、图生图、风格转换、重绘
---
# 图片生成
Gemini快速+ MJ精品模型图片生成。**以参考图为锚点**,确保批量出图风格统一。
Gemini快速+ MJ精品+ GPT Image高品质编辑模型图片生成。**以参考图为锚点**,确保批量出图风格统一。
---
@@ -108,7 +108,8 @@ node .claude/skills/video-from-script/scripts/gemini-image-generator.js edit \
|------|------|-----------|------|
| 快速出图、批量 | **Gemini** | 本地图文件直传(`-i` | ~10sAPI 直出单张 |
| 精品图、写实/艺术 | **MJ** | 公网 URL`-r``--sref` | 高质量4图选1~60s |
| 参考图融合风格 | Gemini 或 MJ | 见下方详细说明 | 两种都支持 |
| 高品质编辑/重绘 | **GPT Image** | 本地图 multipart 上传 | 最强编辑能力,支持多参考图融合 |
| 参考图融合风格 | Gemini 或 GPT Image | 见下方详细说明 | 两种都支持 |
---
@@ -258,6 +259,79 @@ MJ 流程:提交 imagine → 轮询 5s/次 → 下载 4合1 → sharp 拆分
---
## GPT Image 完整用法
OpenAI GPT Image 2 模型,通过云雾 API 代理调用。支持文生图和强大的图生图/重绘能力。
```bash
# 文生图
node .claude/skills/video-from-script/scripts/gpt-image-generator.js generate "prompt" -o ./output -r 9:16 -q auto
# 图生图/重绘(参考图编辑)
node .claude/skills/video-from-script/scripts/gpt-image-generator.js edit "Add sunglasses to this person" -i ./photo.jpg -o ./output
# 多图参考融合
node .claude/skills/video-from-script/scripts/gpt-image-generator.js edit "Combine items into a gift basket" -i ./a.jpg,./b.jpg,./c.jpg -o ./output
# 批量
node .claude/skills/video-from-script/scripts/gpt-image-generator.js batch ./prompts.txt -o ./output -r 9:16 -q low
```
| 参数 | 说明 |
|------|------|
| `-o, --output` | 输出目录 |
| `-r, --ratio` | 宽高比1:1, 9:16, 16:9, 3:4, 4:3 等 |
| `-s, --size` | 尺寸 (1024x1024, 1088x1920, auto),默认按宽高比自动选择 |
| `-q, --quality` | 质量low(快速草稿), medium(默认), high(最终稿), auto |
| `-f, --format` | 格式png(默认), jpeg(更快), webp |
| `-i, --input` | 输入图片edit 模式),逗号分隔多张 |
| `--mask` | 蒙版图片edit 模式,可选) |
| `-n` | 生成数量 (默认: 1) |
### 单图 / 首尾帧
**单图模式**pipeline 中设置 `--image-model gpt-image`,自动文生图或图生图(有参考图时)。
**首尾帧模式**pipeline 设置 `--mode framePair --image-model gpt-image`,自动用首帧图 + lastFramePrompt 编辑生成尾帧。
```bash
# Pipeline 单图模式
node .claude/skills/video-from-script/scripts/pipeline.js init \
--account my-account --mode single --image-model gpt-image \
--items '[{"shotDesc":"...","script":"...","duration":5,"imagePrompt":"..."}]'
node .claude/skills/video-from-script/scripts/pipeline.js run \
--manifest ./output/my-account_XXX/manifest.json --phase images
# Pipeline 首尾帧模式
node .claude/skills/video-from-script/scripts/pipeline.js init \
--account my-account --mode framePair --image-model gpt-image \
--items '[{"shotDesc":"...","script":"...","duration":5,"imagePrompt":"...","lastFramePrompt":"..."}]'
```
### 重绘(编辑现有图片)
支持通过自然语言指令编辑图片:换背景、改物体颜色、添加/移除元素、风格迁移等。
```bash
# 单个图片编辑
node .claude/skills/video-from-script/scripts/gpt-image-generator.js edit \
"Change the background to a sunset beach" -i ./photo.jpg
# 多参考图融合(人物+场景合成)
node .claude/skills/video-from-script/scripts/gpt-image-generator.js edit \
"Place this person in the garden scene" \
-i ./person.png,./garden.jpg -o ./output
```
### API 特点
- **图生图**用 multipart/form-data 上传本地图片,无需公网 URL
- 输入图片自动高保真处理,无需调整 input_fidelity
- 不支持透明背景(`background: transparent`),需要透明图请用 Gemini
- 支持数千种分辨率,短边 ≥ 640px 且为 16 的倍数
---
## 账号系统集成
当用户指定账号时,从项目根目录 `accounts/{account}/` 读取三层资源:
@@ -318,6 +392,14 @@ const r = await geminiGen('prompt', { outputDir: './out', aspectRatio: '9:16' })
const { edit: geminiEdit } = require('./gemini-image-generator')
const r = await geminiEdit('prompt', ['./ref1.png', './ref2.png'], { outputDir: './out', aspectRatio: '9:16' })
// GPT Image 文生图
const { generate: gptGen } = require('./gpt-image-generator')
const r = await gptGen('prompt', { outputDir: './out', size: '1088x1920' })
// GPT Image 图生图/重绘(带参考图)
const { edit: gptEdit } = require('./gpt-image-generator')
const r = await gptEdit('prompt', ['./ref1.png', './ref2.png'], { outputDir: './out' })
// MJ
const { generate: mjGen } = require('./mj-image-generator')
const r = await mjGen('prompt', { outputDir: './out', aspectRatio: '9:16' })

View File

@@ -0,0 +1,531 @@
#!/usr/bin/env node
/**
* GPT Image Generator - GPT Image 模型图片生成工具
*
* 支持模型gpt-image-2, gpt-image-1.5, gpt-image-1
* 通过云雾 (yunwu.ai) API 代理调用,遵循 OpenAI Images API 格式
*
* 功能:
* - 文生图Text-to-Image— /v1/images/generations
* - 图生图/重绘Image-to-Image— /v1/images/edits (multipart)
* - 首尾帧编辑
* - 批量生成
*
* 用法:
* node gpt-image-generator.js generate "a cute cat" -o ./output -r 16:9
* node gpt-image-generator.js edit "add sunglasses" -i ./photo.jpg -o ./output
* node gpt-image-generator.js batch ./prompts.txt -o ./output
*/
const fs = require('fs')
const path = require('path')
const https = require('https')
const http = require('http')
// ============================================================================
// 配置
// ============================================================================
function loadConfig() {
const configPath = path.join(__dirname, '..', '..', 'config.json')
if (fs.existsSync(configPath)) {
return JSON.parse(fs.readFileSync(configPath, 'utf-8'))
}
return {}
}
const cfg = loadConfig()
const Config = {
baseUrl: cfg.gptImageApiBaseUrl || 'https://yunwu.ai',
apiKey: cfg.gptImageApiKey || '',
model: cfg.gptImageModel || 'gpt-image-2',
timeout: 120000,
}
// 宽高比 → 建议分辨率映射 (gpt-image-2 constraints: max edge 3840, multiples of 16, ratio ≤ 3:1)
const RATIO_SIZE_MAP = {
'1:1': '1024x1024',
'3:2': '1536x1024',
'2:3': '1024x1536',
'3:4': '1152x1536',
'4:3': '1536x1152',
'4:5': '1024x1280',
'5:4': '1280x1024',
'9:16': '1088x1920',
'16:9': '1920x1088',
'21:9': '2048x880',
}
// ============================================================================
// API 调用
// ============================================================================
const GptImageApi = {
/**
* 文生图 — POST /v1/images/generations (JSON body)
*/
async generate(prompt, options = {}) {
const {
model = Config.model,
n = 1,
size = '1024x1024',
quality = 'auto',
format = 'png',
outputCompression,
moderation = 'auto',
} = options
const body = {
model,
prompt,
n,
size,
quality,
format,
}
if (outputCompression !== undefined) body.output_compression = outputCompression
if (moderation !== 'auto') body.moderation = moderation
console.log(`\n📡 GPT Image 文生图请求`)
console.log(` 模型: ${model}`)
console.log(` 尺寸: ${size} 质量: ${quality}`)
const res = await fetch(`${Config.baseUrl}/v1/images/generations`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${Config.apiKey}`,
},
body: JSON.stringify(body),
})
if (!res.ok) {
const errText = await res.text()
throw new Error(`GPT Image 生成失败: ${res.status} - ${errText}`)
}
return res.json()
},
/**
* 图生图/编辑 — POST /v1/images/edits (multipart/form-data)
*
* @param {string} prompt - 编辑指令
* @param {string[]} imagePaths - 输入图片路径(第一张为编辑对象,其余为参考)
* @param {string} [maskPath] - 可选蒙版路径
*/
async edit(prompt, imagePaths, options = {}) {
const {
model = Config.model,
n = 1,
size,
maskPath,
} = options
const FormData = globalThis.FormData
const fd = new FormData()
fd.append('model', model)
fd.append('prompt', prompt)
if (n > 1) fd.append('n', String(n))
if (size) fd.append('size', size)
// 附加图片文件
for (const imgPath of imagePaths) {
const buf = fs.readFileSync(imgPath)
const ext = path.extname(imgPath).toLowerCase()
const mimeMap = { '.png': 'image/png', '.jpg': 'image/jpeg', '.jpeg': 'image/jpeg', '.webp': 'image/webp', '.gif': 'image/gif' }
const mimeType = mimeMap[ext] || 'image/png'
fd.append('image', new Blob([buf], { type: mimeType }), path.basename(imgPath))
}
if (maskPath) {
const maskBuf = fs.readFileSync(maskPath)
fd.append('mask', new Blob([maskBuf], { type: 'image/png' }), path.basename(maskPath))
}
console.log(`\n📡 GPT Image 编辑请求`)
console.log(` 模型: ${model}`)
console.log(` 输入图片: ${imagePaths.length}${maskPath ? ' + 蒙版' : ''}`)
if (size) console.log(` 尺寸: ${size}`)
const res = await fetch(`${Config.baseUrl}/v1/images/edits`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${Config.apiKey}`,
},
body: fd,
})
if (!res.ok) {
const errText = await res.text()
throw new Error(`GPT Image 编辑失败: ${res.status} - ${errText}`)
}
return res.json()
},
/**
* 解析响应,提取图片
* 支持 base64 JSON 和 URL 两种格式
*/
parseResponse(response) {
if (!response || !response.data) {
return { images: [] }
}
const images = []
for (const item of response.data) {
if (item.b64_json) {
images.push({ data: item.b64_json, url: item.url, revised_prompt: item.revised_prompt })
} else if (item.url) {
images.push({ url: item.url, revised_prompt: item.revised_prompt })
}
}
return { images }
},
}
// ============================================================================
// 文件处理
// ============================================================================
const FileUtils = {
ensureDir(dirPath) {
if (!fs.existsSync(dirPath)) {
fs.mkdirSync(dirPath, { recursive: true })
}
return dirPath
},
generateFilename(prefix = 'image', ext = 'png') {
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
const random = Math.random().toString(36).substring(2, 8)
return `${prefix}_${timestamp}_${random}.${ext}`
},
readPromptsFile(filePath) {
const content = fs.readFileSync(filePath, 'utf-8')
return content.split('\n').filter(l => l.trim()).map(l => l.trim())
},
async downloadImage(url, outputPath) {
const protocol = url.startsWith('https') ? https : http
return new Promise((resolve, reject) => {
const file = fs.createWriteStream(outputPath)
protocol.get(url, (response) => {
if (response.statusCode >= 300 && response.statusCode < 400 && response.headers.location) {
file.close()
fs.unlinkSync(outputPath)
return FileUtils.downloadImage(response.headers.location, outputPath).then(resolve).catch(reject)
}
response.pipe(file)
file.on('finish', () => { file.close(); resolve(outputPath) })
}).on('error', (err) => {
file.close()
if (fs.existsSync(outputPath)) fs.unlinkSync(outputPath)
reject(err)
})
})
},
}
// ============================================================================
// 核心生成器
// ============================================================================
class GptImageGenerator {
constructor(options = {}) {
this.outputDir = options.outputDir || './output'
this.defaultSize = options.size || '1024x1024'
this.defaultQuality = options.quality || 'auto'
if (!Config.apiKey) {
console.warn('警告: 未设置 gptImageApiKey')
}
}
/**
* 文生图 — 从文字提示生成图片
*/
async textToImage(prompt, options = {}) {
const {
size = this.defaultSize,
quality = this.defaultQuality,
format = 'png',
n = 1,
outputDir = this.outputDir,
filename = null,
} = options
console.log(`\n🎨 GPT Image 文生图: "${prompt.substring(0, 80)}..."`)
console.log(`📐 尺寸: ${size} 🎯 质量: ${quality}`)
const response = await GptImageApi.generate(prompt, { size, quality, format, n })
const result = GptImageApi.parseResponse(response)
const savedFiles = []
FileUtils.ensureDir(outputDir)
for (let i = 0; i < result.images.length; i++) {
const img = result.images[i]
const ext = format === 'jpeg' ? 'jpg' : format
const outputFilename = filename || FileUtils.generateFilename('gpt_gen', ext)
const outputPath = path.join(outputDir, outputFilename)
if (img.data) {
fs.writeFileSync(outputPath, Buffer.from(img.data, 'base64'))
} else if (img.url) {
await FileUtils.downloadImage(img.url, outputPath)
}
savedFiles.push(outputPath)
console.log(`✅ 已保存: ${outputPath}`)
}
return { images: result.images, savedFiles }
}
/**
* 图生图/重绘 — 带参考图编辑
*/
async imageToImage(prompt, inputImages, options = {}) {
const {
size = this.defaultSize,
outputDir = this.outputDir,
maskPath = null,
} = options
const imgPaths = Array.isArray(inputImages) ? inputImages : [inputImages]
console.log(`\n🖼️ GPT Image 编辑: "${prompt.substring(0, 80)}..."`)
console.log(`📁 输入图片: ${imgPaths.length}`)
const response = await GptImageApi.edit(prompt, imgPaths, { size, maskPath, n: 1 })
const result = GptImageApi.parseResponse(response)
const savedFiles = []
FileUtils.ensureDir(outputDir)
for (let i = 0; i < result.images.length; i++) {
const img = result.images[i]
const ext = 'png'
const outputFilename = FileUtils.generateFilename('gpt_edit', ext)
const outputPath = path.join(outputDir, outputFilename)
if (img.data) {
fs.writeFileSync(outputPath, Buffer.from(img.data, 'base64'))
} else if (img.url) {
await FileUtils.downloadImage(img.url, outputPath)
}
savedFiles.push(outputPath)
console.log(`✅ 已保存: ${outputPath}`)
}
return { images: result.images, savedFiles }
}
/**
* 批量文生图
*/
async batchGenerate(prompts, options = {}) {
const results = []
const total = prompts.length
console.log(`\n🚀 GPT Image 批量生成,共 ${total} 个任务`)
for (let i = 0; i < prompts.length; i++) {
console.log(`\n[${i + 1}/${total}] 处理中...`)
try {
const result = await this.textToImage(prompts[i], {
...options,
filename: `batch_${String(i + 1).padStart(3, '0')}.png`,
})
results.push({ success: true, prompt: prompts[i], result })
} catch (error) {
console.error(`❌ 失败: ${error.message}`)
results.push({ success: false, prompt: prompts[i], error: error.message })
}
}
const successCount = results.filter(r => r.success).length
console.log(`\n✨ 批量生成完成: ${successCount}/${total} 成功`)
return results
}
}
// ============================================================================
// 便捷函数(供 pipeline 调用)
// ============================================================================
/**
* 解析 CLI 参数中的宽高比,返回合适的 size 字符串
*/
function ratioToSize(ratio, quality = 'auto') {
return RATIO_SIZE_MAP[ratio] || '1024x1024'
}
// ============================================================================
// CLI
// ============================================================================
function showHelp() {
console.log(`
🎨 GPT Image Generator - 云雾API GPT Image 图片生成工具
📦 模型: ${Config.model}
用法:
node gpt-image-generator.js <command> [options]
命令:
generate <prompt> 文生图
edit <prompt> 图生图/重绘(需要 -i 指定输入图片)
batch <file> 批量生成(从文件读取提示词)
选项:
-o, --output <dir> 输出目录 (默认: ./output)
-r, --ratio <ratio> 宽高比 (1:1, 16:9, 9:16, 3:4, 4:3 等)
-s, --size <size> 尺寸 (1024x1024, 1088x1920, auto 等)
-q, --quality <q> 质量 (low, medium, high, auto)
-f, --format <fmt> 格式 (png, jpeg, webp)
-i, --input <files> 输入图片edit 模式,逗号分隔)
--mask <file> 蒙版图片edit 模式)
-n <num> 生成数量 (默认: 1)
-h, --help 显示帮助
示例:
# 文生图 9:16
node gpt-image-generator.js generate "A cat wearing a hat" -r 9:16 -q medium
# 图生图/重绘
node gpt-image-generator.js edit "Add sunglasses" -i ./photo.jpg
# 多张参考图编辑
node gpt-image-generator.js edit "Combine these items into a gift basket" -i ./a.jpg,./b.jpg
# 批量生成
node gpt-image-generator.js batch ./prompts.txt -r 9:16 -q low
可用宽高比及默认尺寸:
${Object.entries(RATIO_SIZE_MAP).map(([k, v]) => `${k}${v}`).join('\n ')}
`)
}
async function main() {
const args = process.argv.slice(2)
if (args.includes('-h') || args.includes('--help') || args.length === 0) {
showHelp()
return
}
let command = 'generate'
let params = []
const options = { outputDir: './output', size: '1024x1024', quality: 'auto', format: 'png', n: 1 }
let i = 0
if (args[0] === 'batch' || args[0] === 'edit' || args[0] === 'generate') {
command = args[0]
i = 1
}
while (i < args.length) {
const arg = args[i]
if (arg === '-o' || arg === '--output') {
options.outputDir = args[++i]
} else if (arg === '-r' || arg === '--ratio') {
const ratio = args[++i]
options.size = RATIO_SIZE_MAP[ratio] || '1024x1024'
options._ratio = ratio
} else if (arg === '-s' || arg === '--size') {
options.size = args[++i]
} else if (arg === '-q' || arg === '--quality') {
options.quality = args[++i]
} else if (arg === '-f' || arg === '--format') {
options.format = args[++i]
} else if (arg === '-i' || arg === '--input') {
options.inputImages = args[++i].split(',').map(s => s.trim()).filter(Boolean)
} else if (arg === '--mask') {
options.maskPath = args[++i]
} else if (arg === '-n') {
options.n = parseInt(args[++i], 10) || 1
} else {
params.push(arg)
}
i++
}
const generator = new GptImageGenerator({
outputDir: options.outputDir,
size: options.size,
quality: options.quality,
})
if (command === 'batch') {
const filePath = params[0]
if (!filePath || !fs.existsSync(filePath)) {
console.error('请提供提示词文件路径')
process.exit(1)
}
const prompts = FileUtils.readPromptsFile(filePath)
await generator.batchGenerate(prompts, options)
} else if (command === 'edit') {
const prompt = params.join(' ')
if (!prompt) { console.error('请提供编辑指令'); process.exit(1) }
if (!options.inputImages || options.inputImages.length === 0) {
console.error('请使用 -i 指定输入图片')
process.exit(1)
}
await generator.imageToImage(prompt, options.inputImages, {
size: options.size,
outputDir: options.outputDir,
maskPath: options.maskPath,
})
} else {
const prompt = params.join(' ')
if (!prompt) { console.error('请提供生成提示词'); process.exit(1) }
await generator.textToImage(prompt, {
size: options.size,
quality: options.quality,
format: options.format,
n: options.n,
outputDir: options.outputDir,
})
}
}
// ============================================================================
// 导出
// ============================================================================
module.exports = {
GptImageGenerator,
GptImageApi,
Config,
FileUtils,
RATIO_SIZE_MAP,
ratioToSize,
generate: async (prompt, options) => {
const generator = new GptImageGenerator(options)
return generator.textToImage(prompt, options)
},
edit: async (prompt, imagePaths, options) => {
const generator = new GptImageGenerator(options)
return generator.imageToImage(prompt, imagePaths, options)
},
batchGenerate: async (prompts, options) => {
const generator = new GptImageGenerator(options)
return generator.batchGenerate(prompts, options)
},
}
if (require.main === module) {
main().catch(err => {
console.error(`\n❌ 错误: ${err.message}`)
process.exit(1)
})
}

View File

@@ -24,7 +24,7 @@ function validateManifest(manifestPath) {
}
if (!manifest.account) issues.push('缺少顶层 account')
if (!manifest.imageModel) issues.push('缺少顶层 imageModel可选: gemini, mj')
if (!manifest.imageModel) issues.push('缺少顶层 imageModel可选: gemini, gpt-image, mj')
if (!manifest.format) issues.push('缺少顶层 format如 9:16')
if (!manifest.items || !Array.isArray(manifest.items)) issues.push('缺少顶层 items 数组')
if (!manifest.mode) issues.push('缺少顶层 modesingle 或 framePair')

View File

@@ -1,7 +1,7 @@
/**
* Phase: images — 图片生成
*
* 支持 Gemini / MJ / Kling 种模型,含首尾帧模式
* 支持 Gemini / GPT Image / MJ / Kling 种模型,含首尾帧模式
* 并发生成,支持 task ID 恢复MJ
*/
@@ -130,6 +130,32 @@ async function generateMJ(item, idx, dir, imagesDir, ratio, refs, manifestPath,
return harvestMJ(item, idx, dir, imagesDir, ratio, refs, manifestPath, manifest)
}
async function generateGptImage(item, idx, dir, imagesDir, ratio, refs) {
const { generate: gptGen, edit: gptEdit, ratioToSize } = require('../gpt-image-generator')
const size = ratioToSize(ratio)
let result
if (refs.localPaths.length > 0) {
log('images', `[${idx}] GPT Image 图生图: ${item.imagePrompt.substring(0, 60)}...`)
result = await gptEdit(item.imagePrompt, refs.localPaths, {
outputDir: imagesDir,
size,
})
} else {
log('images', `[${idx}] GPT Image 文生图: ${item.imagePrompt.substring(0, 60)}...`)
result = await gptGen(item.imagePrompt, {
outputDir: imagesDir, size,
quality: 'auto',
})
}
const file = (result.savedFiles && result.savedFiles.length > 0)
? renameGeneratedFile(
path.relative(dir, result.savedFiles[0]).replace(/\\/g, '/'),
dir, idx, item.script || item.shotDesc, ''
)
: null
return { file }
}
async function generateKling(item, idx, dir, imagesDir, ratio, refs) {
const { generate: klingGen } = require('../kling-image-generator')
const klingOpts = { outputDir: imagesDir, aspectRatio: ratio }
@@ -158,6 +184,12 @@ async function generateLastFrame(item, idx, manifest, dir, imagesDir, model, rat
outputDir: imagesDir,
aspectRatio: ratio,
})
} else if (model === 'gpt-image') {
const { edit: gptEdit, ratioToSize } = require('../gpt-image-generator')
lastResult = await gptEdit(item.lastFramePrompt, [firstFramePath], {
outputDir: imagesDir,
size: ratioToSize(ratio),
})
} else if (model === 'kling') {
const { generate: klingGen } = require('../kling-image-generator')
lastResult = await klingGen(item.lastFramePrompt, {
@@ -273,10 +305,12 @@ async function processItem(item, manifest, manifestPath, dir, imagesDir, model,
let result
if (model === 'gemini') {
result = await generateGemini(item, idx, dir, imagesDir, ratio, refs)
} else if (model === 'gpt-image') {
result = await generateGptImage(item, idx, dir, imagesDir, ratio, refs)
} else if (model === 'kling') {
result = await generateKling(item, idx, dir, imagesDir, ratio, refs)
} else {
throw new Error(`不支持的模型: ${model}(支持: gemini, mj, kling`)
throw new Error(`不支持的模型: ${model}(支持: gemini, gpt-image, mj, kling`)
}
if (result.file) {

View File

@@ -225,7 +225,7 @@ async function main() {
console.log('用法:')
console.log(' pipeline.js create-account --id <id> --name <名称> [--desc ...] [--references file1,file2]')
console.log(' pipeline.js validate-account --account <id>')
console.log(' pipeline.js init --account <id> --mode <single|framePair> --items <JSON> [--items-file <path>] [--image-model gemini|mj] [--video-model veo3-fast|grok|kling] [--format 9:16]')
console.log(' pipeline.js init --account <id> --mode <single|framePair> --items <JSON> [--items-file <path>] [--image-model gemini|gpt-image|mj] [--video-model veo3-fast|grok|kling] [--format 9:16]')
console.log(' pipeline.js validate --manifest <path>')
console.log(' pipeline.js confirm --manifest <path> --all')
console.log(' pipeline.js confirm --manifest <path> --items 1,3,5')

View File

@@ -3,7 +3,7 @@
"name": "瞬息实验室",
"description": "正在腐烂的梦——梦核阈限空间×克苏鲁宇宙恐怖,熟悉的童年走廊通向不该存在的维度,暖金柔光在画面边缘变质为冷蓝萤光",
"defaultFormat": "16:9",
"imageModel": "mj",
"imageModel": "gpt-image",
"videoModel": "veo3-fast",
"batchSize": 30,
"ttsVoice": "",

View File

@@ -325,6 +325,12 @@ Ultra-realistic cinematic photograph in the style of dreamcore liminal space wit
画风为超写实电影级摄影梦核阈限空间渗透着安静的不安熟悉的童年空间走廊、教室、泳池、操场、卧室延伸至微妙不对的维度空间干净且异常安静——不安来自比例的微差而非暗处的任何东西暖金色时段光线充满画面中心但在边缘逐渐冷却为冷蓝灰与暗紫调单一暖光源白炽灯、窗户光的色温沿空间长度逐渐偏移——同一种灯具在近端是暖白在远端已变为冷蓝灰不应有雾的空间中弥漫着薄霭学校走廊、室内泳池、教室壮观丁达尔效应体积光束穿透静止空气照亮悬浮的微尘——微尘凝在半空不动墙壁表面干净但比例微妙不对——门框略高于标准、天花板略远、走廊比建筑外观应有的长度多了那么一截镜子倒影中窗户的位置与现实房间中窗户的实际位置有微小偏差——你注意到了但无法确认来自同一光源的多道影子中有一道明显比其他更长雾的密度不均匀——某一区域的雾更浓遮挡光线的方式与周围雾不同空气在特定角落有微妙的重量感——密度与房间其他区域不一致不可见的存在仅通过空间微差暗示——建筑比应有的更大、空门框通向的走廊尽头消失点过亮且颜色不对色彩分级中心残留的暖金正在被冷蓝灰从边缘渗入怀旧但安静地不对——一场你终于意识到可能不属于自己的半记忆之梦不安不在「看见了什么」而在「无法确认什么」——余光中比例微微偏移、倒影可能不匹配、某片雾比周围的更浓无人物无人影无人形轮廓无怪物无生物无血腥无身体变异无有机生长无攻击性构图无脏乱废墟腐烂感主体高锐度对焦电影颗粒质感满版出血无边无框无文字无水印16:9画幅。
```
### GPT Image 2
```
Ultra-realistic cinematic photography in the visual language of dreamcore liminal space — the aesthetic of half-remembered childhood places where something quietly deviates from memory. A familiar Chinese campus space from the 1990s — a school corridor, a classroom, a swimming pool hall, a playground — clean and still, the silence itself feeling expectant rather than peaceful. The space is the protagonist; no human presence, no figures, no silhouettes. Warm golden hour light fills the center of the composition, but toward the edges it gradually cools into muted blue-grey and dusky purple — a color shift that is slightly too pronounced for natural light falloff, as if the light itself is subtly changing its nature across the room. A single warm practical light source — incandescent ceiling fixture or afternoon window light — whose color temperature shifts along the length of the space: warm white at the near end, but cooling into an unfamiliar blue-grey at the far end, the same type of fixture producing different qualities of light. Thin atmospheric haze fills a space where mist should not exist — an indoor pool hall, a school corridor, a classroom — softening the edges of familiar objects like a fading memory. Dramatic Tyndall effect volumetric light beams cut through the still air, revealing suspended dust particles frozen mid-float, as if time has paused. The walls and surfaces are clean — almost too clean — but the proportions feel subtly wrong: doorframes fractionally taller than standard architecture allows, the ceiling slightly further away than spatial logic would place it, the corridor extending just a little longer than the building that contains it should permit. A mirror on the wall reflects the room — but the reflected window appears slightly to the left of the actual window's position, a discrepancy you notice peripherally but cannot quite confirm. Among several shadows cast by the same ceiling light, one stretches noticeably longer than the others, its angle subtly mismatched. Mist with uneven density — one patch of fog is thicker than its surroundings, catching the light differently and partially veiling whatever lies behind it. The air itself carries a subtle weight in certain corners — a density that does not match the rest of the room. An unseen presence is suggested only through spatial wrongness — the architecture feels larger than it ought to be, the vanishing point at the end of the corridor glows too luminous and the wrong color for any natural exit. Color grade: fading warm golden light holding the center, desaturated blue-grey undertones bleeding inward from the edges. Nostalgic but quietly, deeply unsettling — the feeling of a half-remembered dream that you slowly realize may not be your own. The unease is not in what you see — it is in what your peripheral vision catches: a proportion slightly off, a reflection that may not match, a patch of air denser than it should be. High sharpness on the focal subject with natural cinematic depth of field falloff. 35mm film grain texture. Full bleed, edge-to-edge composition. No people, no human figures, no silhouettes. No creature, no monster, no gore, no body horror, no organic growth. No jump scare framing, no dirt, no decay, no ruins — the space is clean, its wrongness is in the almost imperceptible deviation from the familiar. No text, no watermark, no logo. Horizontal format, aspect ratio 16:9.
```
---
## 十、构图原则(通用,不因账号而变)
@@ -348,7 +354,7 @@ Ultra-realistic cinematic photograph in the style of dreamcore liminal space wit
| **完整文案/视觉主题** | 本期视觉主题描述 |
| **spaceLight** | `indoor` / `outdoor` / `submerged` / `threshold` |
| **atmosphere** | `dense-mist` / `empty-immensity` / `mirror-shift` |
| **目标模型** | MidJourney / Gemini / Kling |
| **目标模型** | MidJourney / Gemini / Kling / GPT Image |
> 缺少任意一项,提示用户补充,不得凭空生成。
@@ -411,8 +417,9 @@ shotDesc: "an empty school corridor at sunset, warm light through windows castin
- [ ] 画面是「趋势中的瞬间」非「已完成状态」
- [ ] 空间光影模式对应 spaceLight未混用其他空间模式
- [ ] 大气质感模式对应 atmosphere未混用其他大气模式
- [ ] 账号风格词尾已正确使用MJ/Gemini/Kling 对应版本)
- [ ] 模型参数格式正确MJ: --ar 16:9 --style raw --q 2 --v 6.1
- [ ] 账号风格词尾已正确使用MJ/Gemini/Kling/GPT Image 对应版本)
- [ ] 模型参数格式正确MJ: --ar 16:9 --style raw --q 2 --v 6.1 / GPT Image: 无额外参数尾缀,尺寸通过 API 参数指定
- [ ] **GPT Image 模式:** 提示词使用自然叙事段落(非逗号关键词堆叠)
- [ ] 构图为下一帧运动方向留出空间
- [ ] spaceLight + atmosphere 只叠加在环境层,构图内容始终来自 shotDesc
- [ ] **梦核质感:** 观众第一反应是"熟悉/怀念",第二反应才是"这里是不是和记忆中不太一样?"——心理暗示驱动,非恐怖
@@ -427,3 +434,87 @@ shotDesc: "an empty school corridor at sunset, warm light through windows castin
- [ ] **比例锚定:** 微差描述附着在具体物件上(门框高度/灯管色温/窗户反射位置/课桌影子长度/瓷砖排列),给出参照物对比,非空间整体
- [ ] **具象锚点:** 每个抽象气氛词搭配了具体物理表现(不是"不对劲"而是"走廊远端的光色温比近端冷了几个色阶"
- [ ] 无人物、无人影、无血腥、无卡通感、无 Jump Scare 式构图、无脏乱废墟腐烂感、无怪物/生物/眼睛/触手/身体变异/有机寄生
---
## 八点五、GPT Image 2 自然叙事策略GPT Image 专用)
> GPT Image 2 底层由 GPT-4o 做语义规划理解自然语言叙事段落而非关键词堆叠。MJ 的四大显影术是针对 MJ 的「审美美化倾向」设计的对抗策略——GPT Image 不需要对抗,它天生理解「微妙的不对劲」。
### 核心差异
| | MJ | GPT Image 2 |
|---|-----|-----------|
| 理解方式 | 名词+物理属性敏感,对隐喻盲 | 自然语言语义理解,理解「氛围」和「微妙感」 |
| 提示词写法 | 逗号分隔的关键词堆叠 + 物理锚点对抗美化 | 自然叙事段落,描述性散文 |
| 比例偏差 | 必须写具体测量值("门框 20% 太高" | 可以写感受性描述("门框似乎略高于记忆中应有的高度" |
| 光影矛盾 | 需要量化色温值 | 可以写感官描述("暖光在走廊远端悄悄变成了不该出现的冷调" |
| 空间不安 | 需要显式物理锚点 | 可以直接表达「几乎正常但有微妙偏差」的模糊地带 |
| 否定约束 | 放在提示词各处 | 放在末尾集中声明(`no people, no text, no watermark` |
| 编辑能力 | 不支持 | 强大的图生图编辑能力,支持局部重绘 |
### GPT Image 2 提示词写法
**1. 用自然叙事段落,不用关键词堆叠**
GPT Image 读的是散文段落。用完整的句子描述场景、光线、氛围。
```
✅ Photorealistic cinematography. An empty school corridor at golden hour —
warm sunlight streams through tall windows on the left, casting long shadows
across the polished tile floor. At the near end the light is warm amber,
but toward the far end it shifts into something cooler, a muted blue-grey
that doesn't quite belong to this time of day. The doorframes are the same
standard height — yet somehow the ones at the far end seem slightly taller
than they should be. Dust particles hang motionless in the still air,
caught in the light beams but refusing to fall. The corridor is clean and
quiet — the kind of quiet that isn't peaceful but expectant.
No people, no text, no watermark. 35mm film grain, cinematic depth of field.
```
❌ 不要写成 MJ 式的逗号关键词堆叠。
**2. 把梦核十维翻译成自然描述**
| 维度 | GPT Image 自然写法 |
|------|-------------------|
| 阈限空间延伸 | `the corridor stretches further than the building's exterior suggests possible` |
| 光温度渐变 | `warm golden light near the window cooling into an unfamiliar blue-grey toward the far end, the color shift too pronounced for natural falloff` |
| 比例微差 | `the doorframes are all standard height — yet the ones at the end of the hall appear fractionally taller, a measurement you can feel but not prove` |
| 时间悬停 | `a swing hangs mid-arc, frozen — not still from lack of wind, but still as if time paused` |
| 熟悉中的陌生 | `a 1990s Chinese classroom — wooden desks arranged in neat rows, a faded blackboard — but there is one extra window on the left wall that you don't remember from your own school` |
| 镜面偏差 | `the mirror on the classroom wall reflects the room — but the reflected window is positioned slightly to the left of where the actual window stands` |
| 影子自主 | `four desks cast shadows from the same ceiling light — but one shadow stretches a third longer than the others, as if cast from a slightly different angle` |
| 空间自我重复 | `the corridor turns left, then left again — and you are back in the same stretch of hallway, identical doors repeating, the same window appearing where it shouldn't be` |
| 雾密度异常 | `a fine mist fills the indoor pool hall — but one patch near the deep end is noticeably denser, catching the light differently and partially obscuring what lies behind it` |
| 光源矛盾 | `warm incandescent light from the ceiling fixtures — yet the walls catch a cooler fluorescent glow from a source you cannot locate within the room` |
**3. 编辑模式(重绘/首尾帧)**
GPT Image 2 的强项是编辑。对于首尾帧,用三段式协议:
```
Change only: [具体修改内容]
Preserve exactly: [必须保留的所有视觉元素]
Keep same: [光线/构图/比例/氛围等不变项]
```
**4. 质量与尺寸**
| 场景 | quality | size |
|------|---------|------|
| 草稿/探索 | `low` | `1024x1024` 或自动 |
| 社交媒体/中稿 | `medium` | `1088x1920` (9:16) 或 `1920x1088` (16:9) |
| 终稿/打印 | `high` | `2048x2048` 以上 |
默认推荐 `quality: medium` 作为最优性价比选项。`high` 价格约 4x用于最终交付。
### GPT Image 2 提示词自查
- [ ] 用的是自然叙事段落(不需要逗号分隔的关键词堆叠)
- [ ] 梦核十维通过具体场景描述体现(非量化测量值)
- [ ] 「不对劲」用感官描述而非物理测量("似乎比记忆中高一点" 而非 "20% taller"
- [ ] 否定约束集中在末尾no people, no text, no watermark
- [ ] 包含媒介质感词35mm film grain / cinematic depth of field / photorealistic
- [ ] 画面从「熟悉」出发,在边缘处悄然偏移
- [ ] 无人物、无人影、无怪物、无血腥、无文字、无水印