send-stream

This commit is contained in:
wing
2025-11-19 00:15:18 +08:00
parent 33abc33b58
commit eee3206e90
31 changed files with 3000 additions and 0 deletions

511
CLAUDE.md Normal file
View File

@@ -0,0 +1,511 @@
# CLAUDE.md
本文档为 Claude Code (claude.ai/code) 在此仓库中处理代码提供指导。
## 项目概览
**Yudao芋道** - 基于 Spring Boot 的快速开发平台,采用多模块架构。这是 Yudao 平台的 AI/媒体重点部署版本,具备数字人生成、语音克隆、视频混剪和内容分析能力。
### 核心技术栈
**后端:**
- Java 17 + Spring Boot 3.5.5
- Maven 构建管理
- MyBatis Plus 3.5.14 + Dynamic Datasource ORM
- Redis + Redisson 缓存
- Spring Security 6.5.2 认证
- Flowable 7.0.1 工作流
- Springdoc/OpenAPI 文档
**前端:**
- Vue.js 3.5.22 + Composition API
- Vite 7.1.7 构建工具
- Ant Design Vue 4.2.6 UI组件
- TypeScript 类型安全
- Pinia 3.0.3 状态管理
- TailwindCSS 4.1.14 样式
**数据库与基础设施:**
- MySQL 8.0+(主要)
- 支持 PostgreSQL、Oracle、SQL Server、DM、KingbaseES、OpenGauss、TiDB
- Redis 缓存
- Docker 容器化
## 项目结构
```
/d/projects/sionrui/
├── yudao-dependencies/ # Maven 依赖版本管理
├── yudao-framework/ # 框架组件和 Spring Boot 启动器
├── yudao-server/ # 主应用服务器(端口 9900
├── yudao-module-system/ # 系统管理(用户、角色、权限)
├── yudao-module-infra/ # 基础设施(文件、配置、任务)
├── yudao-module-member/ # 会员中心
├── yudao-module-pay/ # 支付系统
├── yudao-module-ai/ # AI/ML 功能(聊天、图像、知识、音乐)
├── yudao-module-tik/ # Tik/媒体模块(语音克隆、头像、视频)
├── frontend/app/web-gold/ # Vue.js 前端
├── sql/ # 数据库模式
├── script/ # 构建和部署脚本
└── docs/ # 文档
```
## 常用开发命令
### 后端Maven
**构建和运行:**
```bash
# 构建项目
mvn clean package -DskipTests
# 运行特定模块的测试
mvn test -pl yudao-module-tik
# 启动服务器
cd yudao-server && mvn spring-boot:run -Dspring-boot.run.profiles=local
# 使用特定配置构建
mvn clean package -Pdev -DskipTests
```
**代码生成:**
- 内置 CRUD 操作代码生成器
- 生成 Java、Vue、SQL 脚本和 API 文档
- 支持单表、树表、主子表模式
### 前端Vue.js
**开发:**
```bash
cd frontend/app/web-gold
# 安装依赖
npm install
# 启动开发服务器(代理到后端 9900 端口)
npm run dev
# 生产构建
npm run build
# 代码检查
npm run lint
# 代码格式化
npm run format
```
**可用脚本:**
- `dev` - 带热重载的开发服务器
- `build` - 生产构建
- `preview` - 预览生产构建
- `lint:oxlint` - 运行 OxLint 并自动修复
- `lint:eslint` - 运行 ESLint 并自动修复
- `lint` - 运行所有检查器
- `format` - 使用 Prettier 格式化代码
### Docker
**使用 Docker Compose**
```bash
# 启动所有服务MySQL、Redis、Server、Admin
cd script/docker
docker-compose up -d
# 启动特定服务
docker-compose up -d mysql redis
```
**手动 Docker 构建:**
```bash
# 后端
cd yudao-server
docker build -t yudao-server .
# 前端
cd frontend/app/web-gold
docker build -t web-gold .
```
## 模块架构
### 后端模块结构模式
每个模块都遵循一致的分层架构:
```
module/
├── controller/ # REST 控制器admin-api/、app/
├── service/ # 业务逻辑 + 接口
│ ├── {Xxx}Service.java # 接口
│ └── {Xxx}ServiceImpl.java # 实现
├── dal/ # 数据访问层
│ ├── mysql/ # MyBatis Mappers 和 DO 类
│ └── redis/ # Redis 操作
├── client/ # 外部 API 客户端
├── config/ # 配置类
├── util/ # 工具类
└── vo/ # 值对象
├── {Xxx}SaveReqVO.java # 创建请求
├── {Xxx}PageReqVO.java # 分页请求
├── {Xxx}UpdateReqVO.java # 更新请求
└── {Xxx}RespVO.java # 响应
```
**核心模块:**
1. **yudao-module-tik** - 媒体/AI 功能
- `voice/` - 语音克隆CosyVoice、Latentsync
- `file/` - 带 OSS 集成的文件管理
- `chat/` - 对话管理
- `media/` - 媒体处理
- `quota/` - 配额管理
2. **yudao-module-ai** - AI/ML 能力
- 聊天补全 API
- 图像生成Midjourney
- 音乐生成Suno
- 带向量搜索的知识库
3. **yudao-module-system** - 核心系统功能
- 用户/角色/权限管理
- 多租户支持
- 审计日志
### 前端结构
```
frontend/app/web-gold/src/
├── api/ # API 服务层
│ ├── axios/ # Axios 拦截器
│ ├── voice.js # 语音相关 API
│ └── mix.js # 视频混剪 API
├── components/ # 可复用 Vue 组件
├── router/
│ └── index.js # Vue Router 配置
├── stores/
│ └── voiceCopy.js # Pinia 状态管理
├── views/
│ ├── dh/ # 数字人功能
│ │ ├── Avatar.vue
│ │ ├── Video.vue
│ │ └── VoiceCopy.vue
│ ├── material/ # 素材库
│ └── content-style/# 内容分析
└── utils/
└── video-cover.ts # 工具函数
```
**核心路由:**
- `/digital-human/*` - 语音克隆、头像、视频生成
- `/content-style/*` - 内容分析和基准测试
- `/trends/*` - 趋势分析
- `/material/*` - 素材库管理
## 配置
### 后端配置文件
**主配置:** `yudao-server/src/main/resources/application.yaml`
- Spring Boot 配置
- 数据库连接
- Redis 设置
- 安全设置
- 多租户配置
- AI 服务 API 密钥
**本地开发:** `yudao-server/src/main/resources/application-local.yaml`
- 本地开发覆盖
- 数据库:`jdbc:mysql://8.155.172.147:3306/sion_rui_dev`
- Redis`8.155.172.147:6379`
- 端口9900
**配置环境:**
- `local` - 开发(端口 9900
- `dev` - 开发服务器
- `prod` - 生产
### 前端配置
**Vite 配置:** `frontend/app/web-gold/vite.config.js`
- 开发服务器代理到后端
- 构建配置
- 插件设置
**API 代理:**
- 开发服务器将 `/admin-api``/api` 代理到 `http://localhost:9900`
## 数据库模式
**位置:** `sql/mysql/`
- 主模式:`ruoyi-vue-pro.sql` (949KB)
- Quartz`quartz.sql` 用于定时任务
- 模块特定迁移在各模块文件夹中
**模式更新:**
- 将 SQL 迁移添加到 `sql/mysql/`
- 遵循命名约定:`V{version}__{description}.sql`
## API 文档
- **Swagger UI** `http://localhost:9900/swagger-ui.html`
- **API 文档:** `http://localhost:9900/v3/api-docs`
**API 路径约定:**
- 管理 API`/admin-api/{module}/{resource}`
- 应用 API`/api/{module}/{resource}`
- CRUD 端点:
- 创建:`POST /module/resource/create`
- 更新:`PUT /module/resource/update`
- 删除:`DELETE /module/resource/delete`
- 查询:`GET /module/resource/get?id=xxx`
- 分页:`GET /module/resource/page`
## 代码风格与规范
### 后端Java
**架构层:**
1. **Controller** - 请求处理、验证、调用 Service
2. **Service** - 业务逻辑、事务管理
3. **Mapper** - 使用 MyBatis Plus 进行数据访问
4. **VO** - API 请求/响应对象
5. **DO** - 映射到数据库表的数据对象
**关键规范:**
- Mapper 接口继承 `BaseMapperX<T>`
- DO 类继承 `BaseDO``TenantBaseDO` 以支持多租户
- 使用 `@PreAuthorize` 进行权限控制
- 统一使用 `CommonResult<T>` 作为 API 响应
- Service 方法使用 `@Transactional` 进行写操作
- 异常代码在 `ErrorCodeConstants` 中,格式为:`MODULE_RESOURCE_ACTION_ERROR`
**命名规范:**
- Controller`{Xxx}Controller``App{Xxx}Controller`
- Service`{Xxx}Service``{Xxx}ServiceImpl`
- Mapper`{Xxx}Mapper`
- VO`{Xxx}SaveReqVO``{Xxx}PageReqVO``{Xxx}RespVO`
- DO`{Xxx}DO`
### 前端Vue.js
**关键模式:**
- Composition API + `<script setup>`
- Pinia 进行状态管理和持久化
- Axios 拦截器处理认证和租户
- TypeScript 提供类型安全
**代码检查:**
- ESLint + OxLint 保证代码质量
- Prettier 进行代码格式化
- 提交前运行 `npm run lint`
## 测试
**后端:**
- JUnit 5 + Mockito 进行单元测试
- Spring Boot Test 进行集成测试
- 测试位置:`src/test/java`
- 运行测试:`mvn test`
**前端:**
- Vitest 进行单元测试
- Cypress 进行端到端测试
- 运行测试:`npm run test`
## 部署
**生产部署:**
```bash
# 使用部署脚本
cd script/shell
./deploy.sh
# 手动部署
# 1. 构建 JAR
mvn clean package -DskipTests -Pprod
# 2. 部署到服务器
# deploy.sh 脚本处理:
# - 备份前一版本
# - 停止当前服务
# - 传输新 JAR
# - 启动服务
# - 健康检查
```
**JVM 选项:**
- 默认:`-Xms512m -Xmx512m -XX:+HeapDumpOnOutOfMemoryError`
- 可在 `deploy.sh` 中配置
**健康检查:**
- 端点:`/actuator/health`
- 端口48080生产或 9900本地
## 开发工作流
### 创建新模块
1. 按照标准目录结构创建模块
2. 将 SQL 迁移添加到 `sql/mysql/`
3. 运行代码生成器进行 CRUD 操作
4. 实现 Controller、Service、Mapper 层
5. 编写单元测试
6. 更新 API 文档
### 添加新 API 端点
1. 在模块的 `vo/` 包中创建 VO 类
2. 使用适当注解实现 Controller
- `@Tag``@Operation` 用于 Swagger 文档
- `@PreAuthorize` 用于权限
- `@Valid` 用于验证
3. 实现包含业务逻辑的 Service 层
4. 如需要,创建/更新 Mapper
5. 通过 Swagger UI 测试端点
### 前端开发
1.`src/api/` 中创建/更新 API 服务
2.`src/stores/` 中添加 Pinia 存储(如需要)
3.`src/views/` 中创建 Vue 组件
4. 更新 `src/router/index.js` 中的路由
5. 使用 `npm run dev` 测试
## 多租户
- 配置中默认启用
- DO 类继承 `TenantBaseDO` 实现租户隔离
- 框架自动注入 `tenantId`
- 需要时使用 `@TenantIgnore` 覆盖
**配置:**
```yaml
yudao:
tenant:
enable: true
ignore-urls:
- /jmreport/*
ignore-tables:
- table_name
```
## 安全
**认证与授权:**
- Spring Security 认证框架
- 基于令牌的身份认证
- 基于角色的访问控制RBAC
- 权限格式:`module:resource:action`
**API 安全:**
- 支持 API 加密AES/RSA
- 配置中的请求/响应加密密钥
- 可配置的 XSS 保护
**数据保护:**
- 字段级加密支持
- 通过 MyBatis Plus 防止 SQL 注入
- 使用 `@Valid` 进行参数验证
## 缓存
**Redis 配置:**
- 缓存类型:`REDIS`
- 默认 TTL1 小时
- 连接:`8.155.172.147:6379`
**缓存模式:**
- 键格式:`模块:资源:id`
- 使用 `@Cacheable``@CacheEvict` 管理缓存
- 热数据缓存提升性能
## 集成点
**AI 服务:**
- CosyVoice语音克隆
- Latentsync语音合成
- Midjourney图像生成
- Suno音乐生成
- 多个 LLM 提供商OpenAI、Claude、Gemini 等)
**文件存储:**
- S3 兼容MinIO、AWS S3 等)
- 本地存储
- FTP
- 数据库存储
**消息队列:**
- RedisPub/Sub、Stream
- Kafka
- RabbitMQ
- RocketMQ
## 监控与可观测性
**Actuator 端点:**
- `/actuator/health` - 健康检查
- `/actuator/metrics` - 指标
- `/actuator/env` - 环境属性
**监控工具:**
- Spring Boot Admin 应用程序监控
- SkyWalking 分布式追踪
- Druid SQL 监控
## Cursor 规则集成
此仓库在 `.cursor/rules/backend.mdc` 中配置了 **Cursor 规则** 用于 Spring Boot 开发最佳实践。主要规则包括:
- 分层架构强制执行Controller → Service → Mapper
- 模块结构约定
- VO/DO 命名标准
- 事务管理模式
- 多租户支持
- API 路径约定
- 安全和权限模式
## 重要注意事项
1. **数据库:** 需要 MySQL 8.0+,外部连接配置在 `application-local.yaml`
2. **Redis** 缓存和会话管理所需
3. **Java 版本:** 需要 JDK 17+
4. **Node 版本:** 前端需要 Node.js 20.19.0+ 或 22.12.0+
5. **端口:** 后端默认 9900前端默认 5173
6. **API 密钥:** `application.yaml` 中配置了多个 AI 服务 API 密钥 - 不要提交到公共仓库
7. **多租户:** 默认启用 - 所有 DO 类应继承 `TenantBaseDO`
## 故障排除
**常见问题:**
1. **端口已被占用:**
-`application-local.yaml` 中更改端口:`server.port=9999`
- 或终止进程:`lsof -ti:9900 | xargs kill -9`
2. **数据库连接失败:**
- 验证 MySQL 运行:`mysql -h 8.155.172.147 -u root -p`
- 检查 `application-local.yaml` 中的连接设置
3. **Redis 连接失败:**
- 验证 Redis`redis-cli -h 8.155.172.147 -p 6379`
- 检查密码/认证设置
4. **前端构建错误:**
- 清理 node_modules`rm -rf node_modules package-lock.json`
- 重新安装:`npm install`
- 检查 Node 版本:`node --version`
5. **Maven 构建错误:**
- 清理构建:`mvn clean install`
- 跳过测试:`mvn clean package -DskipTests`
## 资源
- **官方文档:** https://doc.iocoder.cn/
- **快速开始:** https://doc.iocoder.cn/quick-start/
- **视频教程:** https://doc.iocoder.cn/video/
- **API 文档:** http://localhost:9900/swagger-ui.html运行时
- **Spring Boot 参考:** https://docs.spring.io/spring-boot/docs/current/reference/html/
- **Vue.js 指南:** https://vuejs.org/guide/
- **Yudao GitHub** https://github.com/YunaiV/ruoyi-vue-pro

View File

@@ -0,0 +1,46 @@
/**
* 混剪 API 服务
*/
import http from './http'
import { API_BASE } from '@gold/config/api'
const BASE_URL = `${API_BASE.APP}/api/media`
/**
* 提交素材混剪任务
* @param {Object} data
* @param {string} data.title
* @param {string} data.text
* @param {string[]} data.videoUrls
* @param {string[]} data.bgMusicUrls
* @param {number} data.produceCount
*/
export const MixService = {
batchProduceAlignment({ title, text, videoUrls = [], bgMusicUrls = [], produceCount = 1 }) {
const formData = new URLSearchParams()
formData.append('title', title)
formData.append('text', text)
videoUrls.forEach((url) => {
if (url) {
formData.append('videoArray', url)
}
})
bgMusicUrls.forEach((url) => {
if (url) {
formData.append('bgMusicArray', url)
}
})
formData.append('produceCount', produceCount)
return http.post(`${BASE_URL}/batchProduceAlignment`, formData, {
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
}
})
}
}
export default MixService

View File

@@ -0,0 +1,110 @@
/**
* 配音 API 服务
* 对应后端 tik 模块的配音管理接口
*/
import http from './http'
import { API_BASE } from '@gold/config/api'
// 统一使用 /api/tik 前缀
const BASE_URL = `${API_BASE.APP}/api/tik/voice`
/**
* 配音 API 服务
*/
export const VoiceService = {
/**
* 创建配音
* @param {Object} data - 请求数据
* @param {string} data.name - 配音名称(必填)
* @param {number} data.fileId - 音频文件编号(必填)
* @param {boolean} data.autoTranscribe - 是否自动识别(可选)
* @param {string} data.language - 语言(可选)
* @param {string} data.gender - 音色类型(可选)
* @param {string} data.note - 备注(可选)
* @returns {Promise}
*/
create(data) {
return http.post(`${BASE_URL}/create`, data)
},
/**
* 更新配音
* @param {Object} data - 请求数据
* @param {number} data.id - 配音编号(必填)
* @param {string} data.name - 配音名称(可选)
* @param {string} data.language - 语言(可选)
* @param {string} data.gender - 音色类型(可选)
* @param {string} data.note - 备注(可选)
* @returns {Promise}
*/
update(data) {
return http.put(`${BASE_URL}/update`, data)
},
/**
* 删除配音
* @param {number} id - 配音编号
* @returns {Promise}
*/
delete(id) {
return http.delete(`${BASE_URL}/delete`, {
params: { id }
})
},
/**
* 分页查询配音列表
* @param {Object} params - 查询参数
* @param {number} params.pageNo - 页码
* @param {number} params.pageSize - 每页数量
* @param {string} params.name - 配音名称(模糊查询)
* @returns {Promise}
*/
getPage(params) {
return http.get(`${BASE_URL}/page`, { params })
},
/**
* 获取单个配音
* @param {number} id - 配音编号
* @returns {Promise}
*/
get(id) {
return http.get(`${BASE_URL}/get`, {
params: { id }
})
},
/**
* 手动触发语音识别
* @param {number} id - 配音编号
* @returns {Promise}
*/
transcribe(id) {
return http.post(`${BASE_URL}/transcribe`, null, {
params: { id }
})
},
/**
* 文本转语音CosyVoice
* @param {Object} data
* @returns {Promise}
*/
synthesize(data) {
return http.post(`${BASE_URL}/tts`, data)
},
/**
* 我的音色试听
* @param {Object} data
* @returns {Promise}
*/
preview(data) {
return http.post(`${BASE_URL}/preview`, data)
}
}
export default VoiceService

View File

@@ -0,0 +1,178 @@
package cn.iocoder.yudao.module.tik.voice.client;
import cn.hutool.core.collection.CollUtil;
import cn.hutool.core.util.StrUtil;
import cn.iocoder.yudao.framework.common.exception.ServiceException;
import cn.iocoder.yudao.module.tik.voice.client.dto.CosyVoiceTtsRequest;
import cn.iocoder.yudao.module.tik.voice.client.dto.CosyVoiceTtsResult;
import cn.iocoder.yudao.module.tik.voice.config.CosyVoiceProperties;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import okhttp3.MediaType;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;
import org.springframework.stereotype.Component;
import java.nio.charset.StandardCharsets;
import java.time.Duration;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import static cn.iocoder.yudao.framework.common.exception.util.ServiceExceptionUtil.exception;
import static cn.iocoder.yudao.framework.common.exception.util.ServiceExceptionUtil.exception0;
import static cn.iocoder.yudao.module.tik.enmus.ErrorCodeConstants.VOICE_TTS_FAILED;
/**
* CosyVoice 客户端
*/
@Slf4j
@Component
@RequiredArgsConstructor
public class CosyVoiceClient {
private static final MediaType JSON = MediaType.parse("application/json; charset=utf-8");
private final CosyVoiceProperties properties;
private final ObjectMapper objectMapper;
private volatile OkHttpClient httpClient;
/**
* 调用 CosyVoice TTS 接口
*/
public CosyVoiceTtsResult synthesize(CosyVoiceTtsRequest request) {
if (!properties.isEnabled()) {
throw exception0(VOICE_TTS_FAILED.getCode(), "未配置 CosyVoice API Key");
}
if (request == null || StrUtil.isBlank(request.getText())) {
throw exception0(VOICE_TTS_FAILED.getCode(), "TTS 文本不能为空");
}
try {
String payload = objectMapper.writeValueAsString(buildPayload(request));
Request httpRequest = new Request.Builder()
.url(properties.getTtsUrl())
.addHeader("Authorization", "Bearer " + properties.getApiKey())
.addHeader("Content-Type", "application/json")
.post(RequestBody.create(payload.getBytes(StandardCharsets.UTF_8), JSON))
.build();
try (Response response = getHttpClient().newCall(httpRequest).execute()) {
String body = response.body() != null ? response.body().string() : "";
if (!response.isSuccessful()) {
log.error("[CosyVoice][TTS失败][status={}, body={}]", response.code(), body);
throw buildException(body);
}
return parseTtsResult(body, request);
}
} catch (ServiceException ex) {
throw ex;
} catch (Exception ex) {
log.error("[CosyVoice][TTS异常]", ex);
throw exception(VOICE_TTS_FAILED);
}
}
private Map<String, Object> buildPayload(CosyVoiceTtsRequest request) {
Map<String, Object> payload = new HashMap<>();
String model = StrUtil.blankToDefault(request.getModel(), properties.getDefaultModel());
payload.put("model", model);
Map<String, Object> input = new HashMap<>();
input.put("text", request.getText());
String voiceId = StrUtil.blankToDefault(request.getVoiceId(), properties.getDefaultVoiceId());
if (StrUtil.isNotBlank(voiceId)) {
input.put("voice", voiceId);
}
payload.put("input", input);
Map<String, Object> parameters = new HashMap<>();
int sampleRate = request.getSampleRate() != null ? request.getSampleRate() : properties.getSampleRate();
parameters.put("sample_rate", sampleRate);
String format = StrUtil.blankToDefault(request.getAudioFormat(), properties.getAudioFormat());
parameters.put("format", format);
if (request.getSpeechRate() != null) {
parameters.put("speech_rate", request.getSpeechRate());
}
if (request.getVolume() != null) {
parameters.put("volume", request.getVolume());
}
if (request.isPreview()) {
parameters.put("preview", true);
}
payload.put("parameters", parameters);
return payload;
}
private CosyVoiceTtsResult parseTtsResult(String body, CosyVoiceTtsRequest request) throws Exception {
JsonNode root = objectMapper.readTree(body);
// 错误响应包含 code 字段
if (root.has("code")) {
String message = root.has("message") ? root.get("message").asText() : body;
log.error("[CosyVoice][TTS失败][code={}, message={}]", root.get("code").asText(), message);
throw exception0(VOICE_TTS_FAILED.getCode(), message);
}
JsonNode audioNode = root.path("output").path("audio");
if (!audioNode.isArray() || audioNode.isEmpty()) {
throw exception0(VOICE_TTS_FAILED.getCode(), "CosyVoice 返回的音频为空");
}
JsonNode firstAudio = audioNode.get(0);
String content = firstAudio.path("content").asText();
if (StrUtil.isBlank(content)) {
throw exception0(VOICE_TTS_FAILED.getCode(), "CosyVoice 返回空音频内容");
}
byte[] audioBytes = Base64.getDecoder().decode(content);
CosyVoiceTtsResult result = new CosyVoiceTtsResult();
result.setAudio(audioBytes);
result.setFormat(firstAudio.path("format").asText(StrUtil.blankToDefault(request.getAudioFormat(), properties.getAudioFormat())));
result.setSampleRate(firstAudio.path("sample_rate").asInt(request.getSampleRate() != null ? request.getSampleRate() : properties.getSampleRate()));
result.setRequestId(root.path("request_id").asText());
result.setVoiceId(firstAudio.path("voice").asText(request.getVoiceId()));
return result;
}
private OkHttpClient getHttpClient() {
if (httpClient == null) {
synchronized (this) {
if (httpClient == null) {
java.time.Duration connect = defaultDuration(properties.getConnectTimeout(), 10);
java.time.Duration read = defaultDuration(properties.getReadTimeout(), 60);
httpClient = new OkHttpClient.Builder()
.connectTimeout(connect.toMillis(), TimeUnit.MILLISECONDS)
.readTimeout(read.toMillis(), TimeUnit.MILLISECONDS)
.build();
}
}
}
return httpClient;
}
private Duration defaultDuration(Duration duration, long seconds) {
return duration == null ? Duration.ofSeconds(seconds) : duration;
}
private ServiceException buildException(String body) {
try {
JsonNode root = objectMapper.readTree(body);
String message = CollUtil.getFirst(
CollUtil.newArrayList(
root.path("message").asText(null),
root.path("output").path("message").asText(null)));
return exception0(VOICE_TTS_FAILED.getCode(), StrUtil.blankToDefault(message, "CosyVoice 调用失败"));
} catch (Exception ignored) {
return exception0(VOICE_TTS_FAILED.getCode(), body);
}
}
}

View File

@@ -0,0 +1,141 @@
package cn.iocoder.yudao.module.tik.voice.client;
import cn.hutool.core.util.StrUtil;
import cn.iocoder.yudao.framework.common.exception.ServiceException;
import cn.iocoder.yudao.module.tik.voice.client.dto.LatentsyncSubmitRequest;
import cn.iocoder.yudao.module.tik.voice.client.dto.LatentsyncSubmitResponse;
import cn.iocoder.yudao.module.tik.voice.config.LatentsyncProperties;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import okhttp3.MediaType;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;
import org.springframework.stereotype.Component;
import java.nio.charset.StandardCharsets;
import java.time.Duration;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import static cn.iocoder.yudao.framework.common.exception.util.ServiceExceptionUtil.exception;
import static cn.iocoder.yudao.framework.common.exception.util.ServiceExceptionUtil.exception0;
import static cn.iocoder.yudao.module.tik.enmus.ErrorCodeConstants.LATENTSYNC_SUBMIT_FAILED;
/**
* 302AI Latentsync 客户端
*/
@Slf4j
@Component
@RequiredArgsConstructor
public class LatentsyncClient {
private static final MediaType JSON = MediaType.parse("application/json; charset=utf-8");
private final LatentsyncProperties properties;
private final ObjectMapper objectMapper;
private volatile OkHttpClient httpClient;
public LatentsyncSubmitResponse submitTask(LatentsyncSubmitRequest request) {
if (!properties.isEnabled()) {
throw exception0(LATENTSYNC_SUBMIT_FAILED.getCode(), "未配置 Latentsync API Key");
}
validateRequest(request);
Map<String, Object> payload = buildPayload(request);
try {
String body = objectMapper.writeValueAsString(payload);
Request httpRequest = new Request.Builder()
.url(properties.getSubmitUrl())
.addHeader("Authorization", "Bearer " + properties.getApiKey())
.addHeader("Content-Type", "application/json")
.post(RequestBody.create(body.getBytes(StandardCharsets.UTF_8), JSON))
.build();
try (Response response = getHttpClient().newCall(httpRequest).execute()) {
String responseBody = response.body() != null ? response.body().string() : "";
if (!response.isSuccessful()) {
log.error("[Latentsync][submit failed][status={}, body={}]", response.code(), responseBody);
throw buildException(responseBody);
}
LatentsyncSubmitResponse submitResponse =
objectMapper.readValue(responseBody, LatentsyncSubmitResponse.class);
if (StrUtil.isBlank(submitResponse.getRequestId())) {
log.error("[Latentsync][submit failed][response={}]", responseBody);
throw exception0(LATENTSYNC_SUBMIT_FAILED.getCode(), "Latentsync 返回 requestId 为空");
}
return submitResponse;
}
} catch (ServiceException ex) {
throw ex;
} catch (Exception ex) {
log.error("[Latentsync][submit exception]", ex);
throw exception(LATENTSYNC_SUBMIT_FAILED);
}
}
private void validateRequest(LatentsyncSubmitRequest request) {
if (request == null) {
throw exception0(LATENTSYNC_SUBMIT_FAILED.getCode(), "请求体不能为空");
}
if (StrUtil.isBlank(request.getAudioUrl())) {
throw exception0(LATENTSYNC_SUBMIT_FAILED.getCode(), "音频地址不能为空");
}
if (StrUtil.isBlank(request.getVideoUrl())) {
throw exception0(LATENTSYNC_SUBMIT_FAILED.getCode(), "视频地址不能为空");
}
Integer scale = request.getGuidanceScale();
if (scale != null && (scale < 1 || scale > 2)) {
throw exception0(LATENTSYNC_SUBMIT_FAILED.getCode(), "guidanceScale 取值范围 1-2");
}
}
private Map<String, Object> buildPayload(LatentsyncSubmitRequest request) {
Map<String, Object> payload = new HashMap<>();
payload.put("audio_url", request.getAudioUrl());
payload.put("video_url", request.getVideoUrl());
Integer scale = request.getGuidanceScale() != null
? request.getGuidanceScale() : properties.getDefaultGuidanceScale();
payload.put("guidance_scale", scale);
Integer seed = request.getSeed() != null ? request.getSeed() : properties.getDefaultSeed();
payload.put("seed", seed);
return payload;
}
private OkHttpClient getHttpClient() {
if (httpClient == null) {
synchronized (this) {
if (httpClient == null) {
Duration connect = defaultDuration(properties.getConnectTimeout(), 10);
Duration read = defaultDuration(properties.getReadTimeout(), 60);
httpClient = new OkHttpClient.Builder()
.connectTimeout(connect.toMillis(), TimeUnit.MILLISECONDS)
.readTimeout(read.toMillis(), TimeUnit.MILLISECONDS)
.build();
}
}
}
return httpClient;
}
private Duration defaultDuration(Duration duration, long seconds) {
return duration == null ? Duration.ofSeconds(seconds) : duration;
}
private ServiceException buildException(String body) {
try {
JsonNode root = objectMapper.readTree(body);
String message = root.path("message").asText(body);
return exception0(LATENTSYNC_SUBMIT_FAILED.getCode(), message);
} catch (Exception ignored) {
return exception0(LATENTSYNC_SUBMIT_FAILED.getCode(), body);
}
}
}

View File

@@ -0,0 +1,54 @@
package cn.iocoder.yudao.module.tik.voice.client.dto;
import lombok.Builder;
import lombok.Data;
/**
* CosyVoice TTS 请求
*/
@Data
@Builder
public class CosyVoiceTtsRequest {
/**
* 待合成文本
*/
private String text;
/**
* 声音 ID可选默认使用配置
*/
private String voiceId;
/**
* 模型(默认 cosyvoice-v2
*/
private String model;
/**
* 语速
*/
private Float speechRate;
/**
* 音量,可选
*/
private Float volume;
/**
* 采样率
*/
private Integer sampleRate;
/**
* 音频格式
*/
private String audioFormat;
/**
* 是否仅用于试听,方便服务侧做限流
*/
private boolean preview;
}

View File

@@ -0,0 +1,37 @@
package cn.iocoder.yudao.module.tik.voice.client.dto;
import lombok.Data;
/**
* CosyVoice TTS 响应
*/
@Data
public class CosyVoiceTtsResult {
/**
* 请求ID
*/
private String requestId;
/**
* 返回的音频格式
*/
private String format;
/**
* 采样率
*/
private Integer sampleRate;
/**
* 音频二进制内容
*/
private byte[] audio;
/**
* 音频所使用的 voiceId
*/
private String voiceId;
}

View File

@@ -0,0 +1,34 @@
package cn.iocoder.yudao.module.tik.voice.client.dto;
import lombok.Builder;
import lombok.Data;
/**
* Latentsync 任务提交请求
*/
@Data
@Builder
public class LatentsyncSubmitRequest {
/**
* 音频地址(必填)
*/
private String audioUrl;
/**
* 视频地址(必填)
*/
private String videoUrl;
/**
* 口型约束力度1-2
*/
private Integer guidanceScale;
/**
* 随机种子
*/
private Integer seed;
}

View File

@@ -0,0 +1,39 @@
package cn.iocoder.yudao.module.tik.voice.client.dto;
import lombok.Data;
import java.util.Map;
/**
* Latentsync 任务提交响应
*/
@Data
public class LatentsyncSubmitResponse {
/**
* 日志内容(官方暂未返回,预留)
*/
private Object logs;
/**
* 指标信息
*/
private Map<String, Object> metrics;
/**
* 队列位置
*/
private Integer queuePosition;
/**
* 任务 ID
*/
private String requestId;
/**
* 当前状态
*/
private String status;
}

View File

@@ -0,0 +1,74 @@
package cn.iocoder.yudao.module.tik.voice.config;
import cn.hutool.core.util.StrUtil;
import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
import java.time.Duration;
/**
* CosyVoice 配置
*/
@Data
@Component
@ConfigurationProperties(prefix = "yudao.cosyvoice")
public class CosyVoiceProperties {
/**
* DashScope API Key
*/
private String apiKey;
/**
* 默认模型
*/
private String defaultModel = "cosyvoice-v2";
/**
* 默认 voiceId可选
*/
private String defaultVoiceId;
/**
* 默认采样率
*/
private Integer sampleRate = 24000;
/**
* 默认音频格式
*/
private String audioFormat = "wav";
/**
* 试听默认示例文本
*/
private String previewText = "您好,欢迎体验专属音色。";
/**
* TTS 接口地址
*/
private String ttsUrl = "https://dashscope.aliyuncs.com/api/v1/services/audio/tts/speech-synthesis";
/**
* 连接超时时间
*/
private Duration connectTimeout = Duration.ofSeconds(10);
/**
* 读取超时时间
*/
private Duration readTimeout = Duration.ofSeconds(60);
/**
* 是否启用
*/
private boolean enabled = true;
public boolean isEnabled() {
return enabled && StrUtil.isNotBlank(apiKey);
}
}

View File

@@ -0,0 +1,78 @@
package cn.iocoder.yudao.module.tik.voice.config;
import cn.hutool.core.util.StrUtil;
import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
import java.time.Duration;
/**
* Latentsync 接口配置
*/
@Data
@Component
@ConfigurationProperties(prefix = "tik.latentsync")
public class LatentsyncProperties {
/**
* 302AI API Key可通过配置覆盖
*/
private String apiKey = "ab900d8c94094a90aed3e88cdba785c1";
/**
* 默认海外网关
*/
private String baseUrl = "https://api.302.ai";
/**
* 默认国内中转网关
*/
private String domesticBaseUrl = "https://api.302ai.cn";
/**
* 是否优先使用国内网关
*/
private boolean preferDomestic = false;
/**
* 提交任务路径
*/
private String submitPath = "/302/submit/latentsync";
/**
* guidance_scale 默认值1-2
*/
private Integer defaultGuidanceScale = 1;
/**
* 随机种子默认值
*/
private Integer defaultSeed = 8888;
/**
* 连接超时时间
*/
private Duration connectTimeout = Duration.ofSeconds(10);
/**
* 读取超时时间
*/
private Duration readTimeout = Duration.ofSeconds(60);
/**
* 是否打开调用
*/
private boolean enabled = true;
public String getSubmitUrl() {
String base = preferDomestic ? domesticBaseUrl : baseUrl;
return StrUtil.blankToDefault(base, baseUrl) + submitPath;
}
public boolean isEnabled() {
return enabled && StrUtil.isNotBlank(apiKey);
}
}

View File

@@ -0,0 +1,38 @@
package cn.iocoder.yudao.module.tik.voice.controller;
import cn.iocoder.yudao.framework.common.pojo.CommonResult;
import cn.iocoder.yudao.module.tik.voice.service.LatentsyncService;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikLatentsyncSubmitReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikLatentsyncSubmitRespVO;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.annotation.Resource;
import jakarta.validation.Valid;
import org.springframework.validation.annotation.Validated;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import static cn.iocoder.yudao.framework.common.pojo.CommonResult.success;
/**
* 用户 App - Latentsync 口型同步
*/
@Tag(name = "用户 App - Latentsync 口型同步")
@RestController
@RequestMapping("/api/tik/latentsync")
@Validated
public class AppTikLatentsyncController {
@Resource
private LatentsyncService latentsyncService;
@PostMapping("/submit")
@Operation(summary = "提交 302AI Latentsync 口型任务")
public CommonResult<AppTikLatentsyncSubmitRespVO> submitTask(@Valid @RequestBody AppTikLatentsyncSubmitReqVO reqVO) {
return success(latentsyncService.submitTask(reqVO));
}
}

View File

@@ -0,0 +1,95 @@
package cn.iocoder.yudao.module.tik.voice.controller;
import cn.iocoder.yudao.framework.common.pojo.CommonResult;
import cn.iocoder.yudao.framework.common.pojo.PageResult;
import cn.iocoder.yudao.module.tik.voice.service.TikUserVoiceService;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceCreateReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoicePageReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceRespVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceUpdateReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoicePreviewReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoicePreviewRespVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoiceTtsReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoiceTtsRespVO;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.Parameter;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.annotation.Resource;
import jakarta.validation.Valid;
import lombok.extern.slf4j.Slf4j;
import org.springframework.validation.annotation.Validated;
import org.springframework.web.bind.annotation.*;
import static cn.iocoder.yudao.framework.common.pojo.CommonResult.success;
/**
* 用户 App - 配音管理 Controller
*
* @author 芋道源码
*/
@Tag(name = "用户 App - 配音管理")
@RestController
@RequestMapping("/api/tik/voice")
@Validated
@Slf4j
public class AppTikUserVoiceController {
@Resource
private TikUserVoiceService voiceService;
@PostMapping("/create")
@Operation(summary = "创建配音")
public CommonResult<Long> createVoice(@Valid @RequestBody AppTikUserVoiceCreateReqVO createReqVO) {
return success(voiceService.createVoice(createReqVO));
}
@PutMapping("/update")
@Operation(summary = "更新配音")
public CommonResult<Boolean> updateVoice(@Valid @RequestBody AppTikUserVoiceUpdateReqVO updateReqVO) {
voiceService.updateVoice(updateReqVO);
return success(true);
}
@DeleteMapping("/delete")
@Operation(summary = "删除配音")
@Parameter(name = "id", description = "配音编号", required = true, example = "1")
public CommonResult<Boolean> deleteVoice(@RequestParam("id") Long id) {
voiceService.deleteVoice(id);
return success(true);
}
@GetMapping("/page")
@Operation(summary = "分页查询配音列表")
public CommonResult<PageResult<AppTikUserVoiceRespVO>> getVoicePage(@Valid AppTikUserVoicePageReqVO pageReqVO) {
return success(voiceService.getVoicePage(pageReqVO));
}
@GetMapping("/get")
@Operation(summary = "获取单个配音")
@Parameter(name = "id", description = "配音编号", required = true, example = "1")
public CommonResult<AppTikUserVoiceRespVO> getVoice(@RequestParam("id") Long id) {
return success(voiceService.getVoice(id));
}
@PostMapping("/transcribe")
@Operation(summary = "手动触发语音识别")
@Parameter(name = "id", description = "配音编号", required = true, example = "1")
public CommonResult<Boolean> transcribeVoice(@RequestParam("id") Long id) {
voiceService.transcribeVoice(id);
return success(true);
}
@PostMapping("/tts")
@Operation(summary = "CosyVoice 文本转语音")
public CommonResult<AppTikVoiceTtsRespVO> synthesizeVoice(@Valid @RequestBody AppTikVoiceTtsReqVO reqVO) {
return success(voiceService.synthesizeVoice(reqVO));
}
@PostMapping("/preview")
@Operation(summary = "我的音色试听")
public CommonResult<AppTikVoicePreviewRespVO> previewVoice(@Valid @RequestBody AppTikVoicePreviewReqVO reqVO) {
return success(voiceService.previewVoice(reqVO));
}
}

View File

@@ -0,0 +1,59 @@
package cn.iocoder.yudao.module.tik.voice.dal.dataobject;
import cn.iocoder.yudao.framework.tenant.core.db.TenantBaseDO;
import com.baomidou.mybatisplus.annotation.KeySequence;
import com.baomidou.mybatisplus.annotation.TableId;
import com.baomidou.mybatisplus.annotation.TableName;
import lombok.*;
/**
* 用户配音 DO
*
* @author 芋道源码
*/
@TableName("tik_user_voice")
@KeySequence("tik_user_voice_seq") // 用于 Oracle、PostgreSQL、Kingbase、DB2、H2 数据库的主键自增。如果是 MySQL 等数据库,可不写。
@Data
@EqualsAndHashCode(callSuper = true)
@ToString(callSuper = true)
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class TikUserVoiceDO extends TenantBaseDO {
/**
* 配音编号
*/
@TableId
private Long id;
/**
* 用户编号
*/
private Long userId;
/**
* 配音名称
*/
private String name;
/**
* 音频文件编号(关联 infra_file.id
*/
private Long fileId;
/**
* 语音识别内容,为空表示未识别,有值表示已识别
*/
private String transcription;
/**
* 语言zh-CN-简体中文zh-TW-繁體中文en-US-English
*/
private String language;
/**
* 音色类型female-女声male-男声
*/
private String gender;
/**
* 备注信息
*/
private String note;
}

View File

@@ -0,0 +1,26 @@
package cn.iocoder.yudao.module.tik.voice.dal.mysql;
import cn.iocoder.yudao.framework.common.pojo.PageResult;
import cn.iocoder.yudao.framework.mybatis.core.mapper.BaseMapperX;
import cn.iocoder.yudao.framework.mybatis.core.query.LambdaQueryWrapperX;
import cn.iocoder.yudao.module.tik.voice.dal.dataobject.TikUserVoiceDO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoicePageReqVO;
import org.apache.ibatis.annotations.Mapper;
/**
* 用户配音 Mapper
*
* @author 芋道源码
*/
@Mapper
public interface TikUserVoiceMapper extends BaseMapperX<TikUserVoiceDO> {
default PageResult<TikUserVoiceDO> selectPage(AppTikUserVoicePageReqVO reqVO) {
return selectPage(reqVO, new LambdaQueryWrapperX<TikUserVoiceDO>()
.eqIfPresent(TikUserVoiceDO::getUserId, reqVO.getUserId())
.likeIfPresent(TikUserVoiceDO::getName, reqVO.getName())
.orderByDesc(TikUserVoiceDO::getId));
}
}

View File

@@ -0,0 +1,20 @@
package cn.iocoder.yudao.module.tik.voice.service;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikLatentsyncSubmitReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikLatentsyncSubmitRespVO;
/**
* Latentsync 口型同步 Service
*/
public interface LatentsyncService {
/**
* 提交 302AI Latentsync 任务
*
* @param reqVO 请求 VO
* @return 任务响应
*/
AppTikLatentsyncSubmitRespVO submitTask(AppTikLatentsyncSubmitReqVO reqVO);
}

View File

@@ -0,0 +1,42 @@
package cn.iocoder.yudao.module.tik.voice.service;
import cn.hutool.core.util.StrUtil;
import cn.iocoder.yudao.module.tik.voice.client.LatentsyncClient;
import cn.iocoder.yudao.module.tik.voice.client.dto.LatentsyncSubmitRequest;
import cn.iocoder.yudao.module.tik.voice.client.dto.LatentsyncSubmitResponse;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikLatentsyncSubmitReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikLatentsyncSubmitRespVO;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;
import org.springframework.validation.annotation.Validated;
/**
* Latentsync Service 实现
*/
@Service
@Validated
@RequiredArgsConstructor
public class LatentsyncServiceImpl implements LatentsyncService {
private final LatentsyncClient latentsyncClient;
@Override
public AppTikLatentsyncSubmitRespVO submitTask(@Valid AppTikLatentsyncSubmitReqVO reqVO) {
LatentsyncSubmitRequest request = LatentsyncSubmitRequest.builder()
.audioUrl(StrUtil.trim(reqVO.getAudioUrl()))
.videoUrl(StrUtil.trim(reqVO.getVideoUrl()))
.guidanceScale(reqVO.getGuidanceScale())
.seed(reqVO.getSeed())
.build();
LatentsyncSubmitResponse response = latentsyncClient.submitTask(request);
AppTikLatentsyncSubmitRespVO respVO = new AppTikLatentsyncSubmitRespVO();
respVO.setRequestId(response.getRequestId());
respVO.setStatus(response.getStatus());
respVO.setQueuePosition(response.getQueuePosition());
return respVO;
}
}

View File

@@ -0,0 +1,75 @@
package cn.iocoder.yudao.module.tik.voice.service;
import cn.iocoder.yudao.framework.common.pojo.PageResult;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceCreateReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoicePageReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceRespVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceUpdateReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoicePreviewReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoicePreviewRespVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoiceTtsReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoiceTtsRespVO;
/**
* 用户配音 Service 接口
*
* @author 芋道源码
*/
public interface TikUserVoiceService {
/**
* 创建配音(上传文件 + 可选自动识别)
*
* @param createReqVO 创建请求 VO
* @return 配音编号
*/
Long createVoice(AppTikUserVoiceCreateReqVO createReqVO);
/**
* 更新配音信息
*
* @param updateReqVO 更新请求 VO
*/
void updateVoice(AppTikUserVoiceUpdateReqVO updateReqVO);
/**
* 删除配音
*
* @param id 配音编号
*/
void deleteVoice(Long id);
/**
* 分页查询
*
* @param pageReqVO 分页查询条件
* @return 配音列表
*/
PageResult<AppTikUserVoiceRespVO> getVoicePage(AppTikUserVoicePageReqVO pageReqVO);
/**
* 获取单个配音
*
* @param id 配音编号
* @return 配音信息
*/
AppTikUserVoiceRespVO getVoice(Long id);
/**
* 手动触发语音识别
*
* @param id 配音编号
*/
void transcribeVoice(Long id);
/**
* CosyVoice 文本转语音
*/
AppTikVoiceTtsRespVO synthesizeVoice(AppTikVoiceTtsReqVO reqVO);
/**
* 我的音色试听
*/
AppTikVoicePreviewRespVO previewVoice(AppTikVoicePreviewReqVO reqVO);
}

View File

@@ -0,0 +1,864 @@
package cn.iocoder.yudao.module.tik.voice.service;
import cn.hutool.core.collection.CollUtil;
import cn.hutool.core.util.StrUtil;
import cn.hutool.http.HttpUtil;
import cn.hutool.json.JSONArray;
import cn.hutool.json.JSONObject;
import cn.hutool.json.JSONUtil;
import cn.iocoder.yudao.framework.common.pojo.CommonResult;
import cn.iocoder.yudao.framework.common.pojo.PageResult;
import cn.iocoder.yudao.framework.common.util.collection.CollectionUtils;
import cn.iocoder.yudao.framework.common.util.object.BeanUtils;
import cn.iocoder.yudao.framework.security.core.util.SecurityFrameworkUtils;
import cn.iocoder.yudao.module.infra.api.file.FileApi;
import cn.iocoder.yudao.module.infra.dal.dataobject.file.FileDO;
import cn.iocoder.yudao.module.infra.dal.mysql.file.FileMapper;
import cn.iocoder.yudao.module.tik.file.dal.dataobject.TikUserFileDO;
import cn.iocoder.yudao.module.tik.file.dal.mysql.TikUserFileMapper;
import cn.iocoder.yudao.module.tik.file.service.TikUserFileService;
import cn.iocoder.yudao.module.tik.tikhup.service.TikHupService;
import cn.iocoder.yudao.framework.mybatis.core.query.LambdaQueryWrapperX;
import cn.iocoder.yudao.module.tik.voice.client.CosyVoiceClient;
import cn.iocoder.yudao.module.tik.voice.client.dto.CosyVoiceTtsRequest;
import cn.iocoder.yudao.module.tik.voice.client.dto.CosyVoiceTtsResult;
import cn.iocoder.yudao.module.tik.voice.config.CosyVoiceProperties;
import cn.iocoder.yudao.module.tik.voice.dal.dataobject.TikUserVoiceDO;
import cn.iocoder.yudao.module.tik.voice.dal.mysql.TikUserVoiceMapper;
import cn.iocoder.yudao.module.tik.voice.util.ByteArrayMultipartFile;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceCreateReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoicePageReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceRespVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikUserVoiceUpdateReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoicePreviewReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoicePreviewRespVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoiceTtsReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikVoiceTtsRespVO;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import org.springframework.validation.annotation.Validated;
import jakarta.annotation.Resource;
import java.util.Arrays;
import java.util.Base64;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import static cn.iocoder.yudao.framework.common.exception.util.ServiceExceptionUtil.exception;
import static cn.iocoder.yudao.module.tik.enmus.ErrorCodeConstants.*;
/**
* 用户配音 Service 实现类
*
* @author 芋道源码
*/
@Service
@Validated
@Slf4j
public class TikUserVoiceServiceImpl implements TikUserVoiceService {
@Resource
private TikUserVoiceMapper voiceMapper;
@Resource
private FileMapper fileMapper;
@Resource
private TikUserFileMapper userFileMapper;
@Resource
private TikUserFileService tikUserFileService;
@Resource
private FileApi fileApi;
@Resource
private TikHupService tikHupService;
@Resource
private CosyVoiceClient cosyVoiceClient;
@Resource
private CosyVoiceProperties cosyVoiceProperties;
@Resource
private StringRedisTemplate stringRedisTemplate;
/** 预签名URL过期时间1小时单位 */
private static final int PRESIGN_URL_EXPIRATION_SECONDS = 3600;
private static final String PREVIEW_CACHE_PREFIX = "tik:voice:preview:";
private static final String SYNTH_CACHE_PREFIX = "tik:voice:tts:";
private static final long PREVIEW_CACHE_TTL_SECONDS = 3600;
private static final long SYNTH_CACHE_TTL_SECONDS = 24 * 3600;
@Override
@Transactional(rollbackFor = Exception.class)
public Long createVoice(AppTikUserVoiceCreateReqVO createReqVO) {
Long userId = SecurityFrameworkUtils.getLoginUserId();
// 1. 校验文件是否存在且属于voice分类
FileDO fileDO = fileMapper.selectById(createReqVO.getFileId());
if (fileDO == null) {
throw exception(VOICE_FILE_NOT_EXISTS);
}
// 验证文件分类是否为voice通过tik_user_file表查询
TikUserFileDO userFile = userFileMapper.selectOne(new LambdaQueryWrapperX<TikUserFileDO>()
.eq(TikUserFileDO::getFileId, createReqVO.getFileId())
.eq(TikUserFileDO::getFileCategory, "voice")
.eq(TikUserFileDO::getUserId, userId));
if (userFile == null) {
throw exception(VOICE_FILE_NOT_EXISTS, "文件不存在或不属于voice分类");
}
// 2. 校验名称是否重复
TikUserVoiceDO existingVoice = voiceMapper.selectOne(new LambdaQueryWrapperX<TikUserVoiceDO>()
.eq(TikUserVoiceDO::getUserId, userId)
.eq(TikUserVoiceDO::getName, createReqVO.getName())
.eq(TikUserVoiceDO::getDeleted, false));
if (existingVoice != null) {
throw exception(VOICE_NAME_DUPLICATE);
}
// 3. 创建配音记录
TikUserVoiceDO voice = new TikUserVoiceDO()
.setUserId(userId)
.setName(createReqVO.getName())
.setFileId(createReqVO.getFileId())
.setLanguage(StrUtil.blankToDefault(createReqVO.getLanguage(), "zh-CN"))
.setGender(StrUtil.blankToDefault(createReqVO.getGender(), "female"))
.setNote(createReqVO.getNote())
.setTranscription(null); // 初始为空,表示未识别
voiceMapper.insert(voice);
// 4. 如果开启自动识别,异步执行识别
if (Boolean.TRUE.equals(createReqVO.getAutoTranscribe())) {
String fileAccessUrl = fileApi.presignGetUrl(fileDO.getUrl(), PRESIGN_URL_EXPIRATION_SECONDS);
log.info("[createVoice][开启自动识别,配音编号({})文件ID({})预签名URL({})]",
voice.getId(), fileDO.getId(), fileAccessUrl);
asyncTranscribeVoice(voice.getId(), fileAccessUrl);
}
log.info("[createVoice][用户({})创建配音成功,配音编号({})]", userId, voice.getId());
return voice.getId();
}
@Override
@Transactional(rollbackFor = Exception.class)
public void updateVoice(AppTikUserVoiceUpdateReqVO updateReqVO) {
Long userId = SecurityFrameworkUtils.getLoginUserId();
// 1. 校验配音是否存在且属于当前用户
TikUserVoiceDO voice = voiceMapper.selectById(updateReqVO.getId());
if (voice == null || !voice.getUserId().equals(userId)) {
throw exception(VOICE_NOT_EXISTS);
}
// 2. 如果更新名称,校验名称是否重复
if (StrUtil.isNotBlank(updateReqVO.getName()) && !updateReqVO.getName().equals(voice.getName())) {
TikUserVoiceDO existingVoice = voiceMapper.selectOne(new LambdaQueryWrapperX<TikUserVoiceDO>()
.eq(TikUserVoiceDO::getUserId, userId)
.eq(TikUserVoiceDO::getName, updateReqVO.getName())
.eq(TikUserVoiceDO::getDeleted, false)
.ne(TikUserVoiceDO::getId, updateReqVO.getId()));
if (existingVoice != null) {
throw exception(VOICE_NAME_DUPLICATE);
}
}
// 3. 更新配音信息
TikUserVoiceDO updateObj = new TikUserVoiceDO()
.setId(updateReqVO.getId());
if (StrUtil.isNotBlank(updateReqVO.getName())) {
updateObj.setName(updateReqVO.getName());
}
if (StrUtil.isNotBlank(updateReqVO.getLanguage())) {
updateObj.setLanguage(updateReqVO.getLanguage());
}
if (StrUtil.isNotBlank(updateReqVO.getGender())) {
updateObj.setGender(updateReqVO.getGender());
}
if (updateReqVO.getNote() != null) {
updateObj.setNote(updateReqVO.getNote());
}
if (updateReqVO.getTranscription() != null) {
updateObj.setTranscription(updateReqVO.getTranscription());
}
voiceMapper.updateById(updateObj);
log.info("[updateVoice][用户({})更新配音成功,配音编号({})]", userId, updateReqVO.getId());
}
@Override
@Transactional(rollbackFor = Exception.class)
public void deleteVoice(Long id) {
Long userId = SecurityFrameworkUtils.getLoginUserId();
// 1. 校验配音是否存在且属于当前用户
TikUserVoiceDO voice = voiceMapper.selectById(id);
if (voice == null || !voice.getUserId().equals(userId)) {
throw exception(VOICE_NOT_EXISTS);
}
// 2. 删除音频文件含OSS
TikUserFileDO userFile = userFileMapper.selectOne(new LambdaQueryWrapperX<TikUserFileDO>()
.eq(TikUserFileDO::getFileId, voice.getFileId())
.eq(TikUserFileDO::getUserId, userId));
if (userFile != null) {
tikUserFileService.deleteFiles(Collections.singletonList(userFile.getId()));
}
// 3. 逻辑删除配音记录
voiceMapper.deleteById(id);
log.info("[deleteVoice][用户({})删除配音成功,配音编号({})]", userId, id);
}
@Override
public PageResult<AppTikUserVoiceRespVO> getVoicePage(AppTikUserVoicePageReqVO pageReqVO) {
// 自动填充当前登录用户ID
Long userId = SecurityFrameworkUtils.getLoginUserId();
pageReqVO.setUserId(userId);
// 查询配音列表
PageResult<TikUserVoiceDO> pageResult = voiceMapper.selectPage(pageReqVO);
// 批量查询文件信息,避免 N+1 查询
Map<Long, FileDO> fileMap = new HashMap<>();
if (CollUtil.isNotEmpty(pageResult.getList())) {
List<Long> fileIds = pageResult.getList().stream()
.map(TikUserVoiceDO::getFileId)
.distinct()
.collect(Collectors.toList());
if (CollUtil.isNotEmpty(fileIds)) {
List<FileDO> files = fileMapper.selectBatchIds(fileIds);
Map<Long, FileDO> tempFileMap = files.stream()
.collect(Collectors.toMap(FileDO::getId, file -> file));
fileMap.putAll(tempFileMap);
}
}
// 转换为VO并关联查询文件信息
return CollectionUtils.convertPage(pageResult, voice -> {
AppTikUserVoiceRespVO vo = BeanUtils.toBean(voice, AppTikUserVoiceRespVO.class);
// 通过 file_id 关联查询文件URL并生成预签名URL
FileDO fileDO = fileMap.get(voice.getFileId());
if (fileDO != null) {
// 生成预签名URL1小时有效期
String presignedUrl = fileApi.presignGetUrl(fileDO.getUrl(), PRESIGN_URL_EXPIRATION_SECONDS);
vo.setFileUrl(presignedUrl);
}
return vo;
});
}
@Override
public AppTikUserVoiceRespVO getVoice(Long id) {
Long userId = SecurityFrameworkUtils.getLoginUserId();
// 1. 查询配音
TikUserVoiceDO voice = voiceMapper.selectById(id);
if (voice == null || !voice.getUserId().equals(userId)) {
throw exception(VOICE_NOT_EXISTS);
}
// 2. 转换为VO并关联查询文件信息
AppTikUserVoiceRespVO vo = BeanUtils.toBean(voice, AppTikUserVoiceRespVO.class);
// 通过 file_id 关联查询文件URL并生成预签名URL
FileDO fileDO = fileMapper.selectById(voice.getFileId());
if (fileDO != null) {
// 生成预签名URL1小时有效期
String presignedUrl = fileApi.presignGetUrl(fileDO.getUrl(), PRESIGN_URL_EXPIRATION_SECONDS);
vo.setFileUrl(presignedUrl);
}
return vo;
}
@Override
@Transactional(rollbackFor = Exception.class)
public void transcribeVoice(Long id) {
Long userId = SecurityFrameworkUtils.getLoginUserId();
// 1. 校验配音是否存在且属于当前用户
TikUserVoiceDO voice = voiceMapper.selectById(id);
if (voice == null || !voice.getUserId().equals(userId)) {
throw exception(VOICE_NOT_EXISTS);
}
// 2. 获取文件URL
FileDO fileDO = fileMapper.selectById(voice.getFileId());
if (fileDO == null) {
throw exception(VOICE_FILE_NOT_EXISTS);
}
// 3. 异步执行识别
String fileAccessUrl = fileApi.presignGetUrl(fileDO.getUrl(), PRESIGN_URL_EXPIRATION_SECONDS);
asyncTranscribeVoice(id, fileAccessUrl);
}
@Override
public AppTikVoiceTtsRespVO synthesizeVoice(AppTikVoiceTtsReqVO reqVO) {
String finalText = determineSynthesisText(
reqVO.getTranscriptionText(),
reqVO.getInputText(),
false);
finalText = appendEmotion(finalText, reqVO.getEmotion());
String cacheKey = buildCacheKey(SYNTH_CACHE_PREFIX,
reqVO.getVoiceId(),
reqVO.getFileUrl(),
finalText,
reqVO.getSpeechRate(),
reqVO.getVolume(),
reqVO.getEmotion(),
reqVO.getAudioFormat(),
reqVO.getSampleRate());
SynthCacheEntry synthCache = getSynthCache(cacheKey);
if (synthCache != null) {
return buildSynthResponseFromCache(reqVO, synthCache);
}
CosyVoiceTtsResult ttsResult = cosyVoiceClient.synthesize(buildTtsRequest(
finalText,
reqVO.getVoiceId(),
reqVO.getModel(),
reqVO.getSpeechRate(),
reqVO.getVolume(),
reqVO.getSampleRate(),
reqVO.getAudioFormat(),
false
));
String format = defaultFormat(ttsResult.getFormat(), reqVO.getAudioFormat());
String voiceId = StrUtil.blankToDefault(reqVO.getVoiceId(), cosyVoiceProperties.getDefaultVoiceId());
ByteArrayMultipartFile multipartFile = new ByteArrayMultipartFile(
"file",
buildFileName(voiceId, format),
resolveContentType(format),
ttsResult.getAudio()
);
Long fileId = tikUserFileService.uploadFile(multipartFile, "audio", null);
AppTikVoiceTtsRespVO respVO = new AppTikVoiceTtsRespVO();
respVO.setFileId(fileId);
respVO.setAudioUrl(tikUserFileService.getAudioPlayUrl(fileId));
respVO.setFormat(format);
respVO.setSampleRate(ttsResult.getSampleRate());
respVO.setRequestId(ttsResult.getRequestId());
respVO.setVoiceId(voiceId);
saveSynthCache(cacheKey, new SynthCacheEntry(
Base64.getEncoder().encodeToString(ttsResult.getAudio()),
format,
ttsResult.getSampleRate(),
ttsResult.getRequestId(),
voiceId
));
return respVO;
}
@Override
public AppTikVoicePreviewRespVO previewVoice(AppTikVoicePreviewReqVO reqVO) {
String finalText = determineSynthesisText(
reqVO.getTranscriptionText(),
reqVO.getInputText(),
true);
finalText = appendEmotion(finalText, reqVO.getEmotion());
String cacheKey = buildCacheKey(PREVIEW_CACHE_PREFIX,
reqVO.getVoiceId(),
reqVO.getFileUrl(),
finalText,
reqVO.getSpeechRate(),
reqVO.getVolume(),
reqVO.getEmotion(),
reqVO.getAudioFormat(),
null);
PreviewCacheEntry previewCache = getPreviewCache(cacheKey);
String voiceId = StrUtil.blankToDefault(reqVO.getVoiceId(), cosyVoiceProperties.getDefaultVoiceId());
if (previewCache != null) {
String cachedUrl = fileApi.presignGetUrl(previewCache.getFileUrl(), PRESIGN_URL_EXPIRATION_SECONDS);
return buildPreviewResp(previewCache, cachedUrl, voiceId);
}
CosyVoiceTtsResult ttsResult = cosyVoiceClient.synthesize(buildTtsRequest(
finalText,
reqVO.getVoiceId(),
reqVO.getModel(),
reqVO.getSpeechRate(),
reqVO.getVolume(),
null,
reqVO.getAudioFormat(),
true
));
String format = defaultFormat(ttsResult.getFormat(), reqVO.getAudioFormat());
voiceId = StrUtil.blankToDefault(reqVO.getVoiceId(), cosyVoiceProperties.getDefaultVoiceId());
String objectName = buildFileName(voiceId, format);
String fileUrl = fileApi.createFile(ttsResult.getAudio(), objectName, "voice/preview", resolveContentType(format));
String presignUrl = fileApi.presignGetUrl(fileUrl, PRESIGN_URL_EXPIRATION_SECONDS);
PreviewCacheEntry entry = new PreviewCacheEntry(fileUrl, format, ttsResult.getSampleRate(), ttsResult.getRequestId());
savePreviewCache(cacheKey, entry);
return buildPreviewResp(entry, presignUrl, voiceId);
}
private CosyVoiceTtsRequest buildTtsRequest(String text,
String voiceId,
String model,
Float speechRate,
Float volume,
Integer sampleRate,
String audioFormat,
boolean preview) {
return CosyVoiceTtsRequest.builder()
.text(text)
.voiceId(voiceId)
.model(model)
.speechRate(speechRate)
.volume(volume)
.sampleRate(sampleRate)
.audioFormat(audioFormat)
.preview(preview)
.build();
}
private String defaultFormat(String responseFormat, String requestFormat) {
return StrUtil.blankToDefault(responseFormat,
StrUtil.blankToDefault(requestFormat, cosyVoiceProperties.getAudioFormat()));
}
private String buildFileName(String voiceId, String format) {
String safeVoice = StrUtil.blankToDefault(voiceId, "voice")
.replaceAll("[^a-zA-Z0-9_-]", "");
return safeVoice + "-" + System.currentTimeMillis() + "." + format;
}
private String resolveContentType(String format) {
if ("wav".equalsIgnoreCase(format)) {
return "audio/wav";
}
if ("mp3".equalsIgnoreCase(format)) {
return "audio/mpeg";
}
if ("flac".equalsIgnoreCase(format)) {
return "audio/flac";
}
return "audio/mpeg";
}
private String determineSynthesisText(String transcriptionText, String inputText, boolean allowFallback) {
StringBuilder builder = new StringBuilder();
if (StrUtil.isNotBlank(transcriptionText)) {
builder.append(transcriptionText.trim());
}
if (StrUtil.isNotBlank(inputText)) {
if (builder.length() > 0) {
builder.append("\n");
}
builder.append(inputText.trim());
}
if (builder.length() > 0) {
return builder.toString();
}
if (allowFallback) {
return cosyVoiceProperties.getPreviewText();
}
throw exception(VOICE_TTS_FAILED, "请提供需要合成的文本内容");
}
private String appendEmotion(String text, String emotion) {
if (StrUtil.isBlank(text)) {
return text;
}
if (StrUtil.isBlank(emotion) || "neutral".equalsIgnoreCase(emotion)) {
return text;
}
String emotionLabel = switch (emotion.toLowerCase()) {
case "happy" -> "高兴";
case "angry" -> "愤怒";
case "sad" -> "悲伤";
case "scared" -> "害怕";
case "disgusted" -> "厌恶";
case "surprised" -> "惊讶";
default -> emotion;
};
return "【情感:" + emotionLabel + "" + text;
}
private String buildCacheKey(String prefix,
String voiceId,
String fileUrl,
String text,
Float speechRate,
Float volume,
String emotion,
String audioFormat,
Integer sampleRate) {
String identifier = StrUtil.isNotBlank(voiceId)
? voiceId
: StrUtil.blankToDefault(fileUrl, "no-voice");
String payload = StrUtil.join("|",
identifier,
text,
speechRate != null ? speechRate : "1.0",
volume != null ? volume : "0",
StrUtil.blankToDefault(emotion, "neutral"),
StrUtil.blankToDefault(audioFormat, cosyVoiceProperties.getAudioFormat()),
sampleRate != null ? sampleRate : cosyVoiceProperties.getSampleRate());
String hash = cn.hutool.crypto.SecureUtil.sha256(payload);
return prefix + hash;
}
private PreviewCacheEntry getPreviewCache(String key) {
try {
String json = stringRedisTemplate.opsForValue().get(key);
if (StrUtil.isBlank(json)) {
return null;
}
return JSONUtil.toBean(json, PreviewCacheEntry.class);
} catch (Exception ex) {
log.warn("[previewVoice][cache read failed][key={}]", key, ex);
return null;
}
}
private void savePreviewCache(String key, PreviewCacheEntry entry) {
try {
stringRedisTemplate.opsForValue().set(
key,
JSONUtil.toJsonStr(entry),
PREVIEW_CACHE_TTL_SECONDS,
TimeUnit.SECONDS);
} catch (Exception ex) {
log.warn("[previewVoice][cache write failed][key={}]", key, ex);
}
}
private SynthCacheEntry getSynthCache(String key) {
try {
String json = stringRedisTemplate.opsForValue().get(key);
if (StrUtil.isBlank(json)) {
return null;
}
return JSONUtil.toBean(json, SynthCacheEntry.class);
} catch (Exception ex) {
log.warn("[synthesizeVoice][cache read failed][key={}]", key, ex);
return null;
}
}
private void saveSynthCache(String key, SynthCacheEntry entry) {
try {
stringRedisTemplate.opsForValue().set(
key,
JSONUtil.toJsonStr(entry),
SYNTH_CACHE_TTL_SECONDS,
TimeUnit.SECONDS);
} catch (Exception ex) {
log.warn("[synthesizeVoice][cache write failed][key={}]", key, ex);
}
}
private AppTikVoiceTtsRespVO buildSynthResponseFromCache(AppTikVoiceTtsReqVO reqVO, SynthCacheEntry cache) {
byte[] audioBytes = Base64.getDecoder().decode(cache.getAudioBase64());
String format = defaultFormat(cache.getFormat(), reqVO.getAudioFormat());
String voiceId = StrUtil.blankToDefault(reqVO.getVoiceId(), cache.getVoiceId());
ByteArrayMultipartFile multipartFile = new ByteArrayMultipartFile(
"file",
buildFileName(voiceId, format),
resolveContentType(format),
audioBytes
);
Long fileId = tikUserFileService.uploadFile(multipartFile, "audio", null);
AppTikVoiceTtsRespVO respVO = new AppTikVoiceTtsRespVO();
respVO.setFileId(fileId);
respVO.setAudioUrl(tikUserFileService.getAudioPlayUrl(fileId));
respVO.setFormat(format);
respVO.setSampleRate(cache.getSampleRate());
respVO.setRequestId(cache.getRequestId());
respVO.setVoiceId(voiceId);
return respVO;
}
private AppTikVoicePreviewRespVO buildPreviewResp(PreviewCacheEntry entry, String presignUrl, String voiceId) {
AppTikVoicePreviewRespVO respVO = new AppTikVoicePreviewRespVO();
respVO.setAudioUrl(presignUrl);
respVO.setFormat(entry.getFormat());
respVO.setSampleRate(entry.getSampleRate());
respVO.setRequestId(entry.getRequestId());
respVO.setVoiceId(voiceId);
return respVO;
}
private static class PreviewCacheEntry {
private String fileUrl;
private String format;
private Integer sampleRate;
private String requestId;
public PreviewCacheEntry() {}
public PreviewCacheEntry(String fileUrl, String format, Integer sampleRate, String requestId) {
this.fileUrl = fileUrl;
this.format = format;
this.sampleRate = sampleRate;
this.requestId = requestId;
}
public String getFileUrl() {
return fileUrl;
}
public String getFormat() {
return format;
}
public Integer getSampleRate() {
return sampleRate;
}
public String getRequestId() {
return requestId;
}
}
private static class SynthCacheEntry {
private String audioBase64;
private String format;
private Integer sampleRate;
private String requestId;
private String voiceId;
public SynthCacheEntry() {}
public SynthCacheEntry(String audioBase64, String format, Integer sampleRate, String requestId, String voiceId) {
this.audioBase64 = audioBase64;
this.format = format;
this.sampleRate = sampleRate;
this.requestId = requestId;
this.voiceId = voiceId;
}
public String getAudioBase64() {
return audioBase64;
}
public String getFormat() {
return format;
}
public Integer getSampleRate() {
return sampleRate;
}
public String getRequestId() {
return requestId;
}
public String getVoiceId() {
return voiceId;
}
}
/**
* 异步执行语音识别
*
* @param voiceId 配音编号
* @param fileUrl 文件URL
*/
@Async
public void asyncTranscribeVoice(Long voiceId, String fileUrl) {
try {
log.info("[asyncTranscribeVoice][开始识别,配音编号({})文件URL({})]", voiceId, fileUrl);
Object result = tikHupService.videoToCharacters2(Collections.singletonList(fileUrl));
// 解析识别结果
String transcription = extractTranscription(result);
if (StrUtil.isNotBlank(transcription)) {
// 更新识别结果
TikUserVoiceDO updateObj = new TikUserVoiceDO()
.setId(voiceId)
.setTranscription(transcription);
voiceMapper.updateById(updateObj);
log.info("[asyncTranscribeVoice][识别成功,配音编号({}),文本长度({})]", voiceId, transcription.length());
} else {
log.warn("[asyncTranscribeVoice][识别结果为空,配音编号({}),返回码({})]",
voiceId, result instanceof CommonResult ? ((CommonResult<?>) result).getCode() : "未知");
}
} catch (Exception e) {
log.error("[asyncTranscribeVoice][识别失败,配音编号({})文件URL({})]", voiceId, fileUrl, e);
}
}
/**
* 从识别结果中提取文字内容
* 根据 TikHupService.videoToCharacters* 的实际返回格式进行解析
*
* @param result 识别结果
* @return 文字内容
*/
private String extractTranscription(Object result) {
if (result == null) {
return null;
}
try {
if (result instanceof CommonResult<?> commonResult) {
if (!commonResult.isSuccess()) {
log.warn("[extractTranscription][识别失败code({})msg({})]",
commonResult.getCode(), commonResult.getMsg());
return null;
}
Object data = commonResult.getData();
if (data == null) {
return null;
}
String parsed = parseTranscriptionText(data);
if (StrUtil.isNotBlank(parsed)) {
return parsed;
}
return data.toString();
}
String parsed = parseTranscriptionText(result);
if (StrUtil.isNotBlank(parsed)) {
return parsed;
}
return result.toString();
} catch (Exception e) {
log.warn("[extractTranscription][解析识别结果失败]", e);
return null;
}
}
private static final List<String> TRANSCRIPTION_TEXT_KEYS =
Arrays.asList("text", "sentence", "result", "content", "transcript", "output_text", "display_text");
private String parseTranscriptionText(Object rawData) {
if (rawData == null) {
return null;
}
String rawString = rawData instanceof String ? (String) rawData : JSONUtil.toJsonStr(rawData);
if (StrUtil.isBlank(rawString)) {
return null;
}
if (!JSONUtil.isTypeJSON(rawString)) {
return rawString;
}
try {
Object json = JSONUtil.parse(rawString);
String localText = extractTextFromJson(json);
if (StrUtil.isNotBlank(localText)) {
return localText;
}
if (json instanceof JSONObject jsonObject) {
JSONArray results = jsonObject.getJSONArray("results");
if (CollUtil.isEmpty(results)) {
return null;
}
Object lastObj = results.get(results.size() - 1);
if (!(lastObj instanceof JSONObject lastResult)) {
return null;
}
String transcriptionUrl = lastResult.getStr("transcription_url");
if (StrUtil.isBlank(transcriptionUrl)) {
return null;
}
StringBuilder builder = new StringBuilder();
appendRemoteTranscription(builder, transcriptionUrl);
return builder.length() > 0 ? builder.toString().trim() : null;
}
} catch (Exception e) {
log.warn("[parseTranscriptionText][解析Paraformer结果失败]", e);
}
return rawString;
}
private void appendRemoteTranscription(StringBuilder builder, String transcriptionUrl) {
if (StrUtil.isBlank(transcriptionUrl)) {
return;
}
String remoteContent = fetchRemoteTranscription(transcriptionUrl);
if (StrUtil.isBlank(remoteContent)) {
return;
}
String remoteText = extractTextFromJson(JSONUtil.parse(remoteContent));
if (StrUtil.isNotBlank(remoteText)) {
appendLine(builder, remoteText);
}
}
private String extractTextFromJson(Object json) {
if (json == null) {
return null;
}
StringBuilder builder = new StringBuilder();
collectTranscriptionText(json, builder);
return builder.length() > 0 ? builder.toString().trim() : null;
}
private String fetchRemoteTranscription(String url) {
try {
String body = HttpUtil.get(url);
if (StrUtil.isNotBlank(body)) {
return body;
}
} catch (Exception e) {
log.warn("[fetchRemoteTranscription][下载转写文本失败url({})]", url, e);
}
return null;
}
private void collectTranscriptionText(Object node, StringBuilder builder) {
if (node == null) {
return;
}
if (node instanceof JSONObject jsonObject) {
for (String key : jsonObject.keySet()) {
Object value = jsonObject.get(key);
if (value == null) {
continue;
}
if (value instanceof CharSequence && TRANSCRIPTION_TEXT_KEYS.contains(key)) {
appendLine(builder, value.toString());
} else if (value instanceof JSONObject || value instanceof JSONArray) {
collectTranscriptionText(value, builder);
}
}
} else if (node instanceof JSONArray jsonArray) {
for (Object item : jsonArray) {
collectTranscriptionText(item, builder);
}
}
}
private void appendLine(StringBuilder builder, String line) {
String normalized = StrUtil.trim(line);
if (StrUtil.isBlank(normalized)) {
return;
}
if (builder.length() > 0) {
builder.append('\n');
}
builder.append(normalized);
}
}

View File

@@ -0,0 +1,69 @@
package cn.iocoder.yudao.module.tik.voice.util;
import org.springframework.util.FileCopyUtils;
import org.springframework.web.multipart.MultipartFile;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
/**
* 仅用于在服务内部上传的内存文件
*/
public class ByteArrayMultipartFile implements MultipartFile {
private final String name;
private final String originalFilename;
private final String contentType;
private final byte[] content;
public ByteArrayMultipartFile(String name, String originalFilename, String contentType, byte[] content) {
this.name = name;
this.originalFilename = originalFilename;
this.contentType = contentType;
this.content = content != null ? content : new byte[0];
}
@Override
public String getName() {
return name;
}
@Override
public String getOriginalFilename() {
return originalFilename;
}
@Override
public String getContentType() {
return contentType;
}
@Override
public boolean isEmpty() {
return content.length == 0;
}
@Override
public long getSize() {
return content.length;
}
@Override
public byte[] getBytes() {
return content;
}
@Override
public InputStream getInputStream() {
return new ByteArrayInputStream(content);
}
@Override
public void transferTo(File dest) throws IOException {
FileCopyUtils.copy(content, dest);
}
}

View File

@@ -0,0 +1,37 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.Max;
import jakarta.validation.constraints.Min;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Size;
import lombok.Data;
/**
* Latentsync 提交请求 VO
*/
@Data
public class AppTikLatentsyncSubmitReqVO {
@Schema(description = "音频 URL需公网可访问", requiredMode = Schema.RequiredMode.REQUIRED,
example = "https://example.com/audio.wav")
@NotBlank(message = "音频地址不能为空")
@Size(max = 1024, message = "音频地址长度不能超过 1024 字符")
private String audioUrl;
@Schema(description = "视频 URL需公网可访问", requiredMode = Schema.RequiredMode.REQUIRED,
example = "https://example.com/video.mp4")
@NotBlank(message = "视频地址不能为空")
@Size(max = 1024, message = "视频地址长度不能超过 1024 字符")
private String videoUrl;
@Schema(description = "guidance_scale范围 1-2默认 1", example = "1")
@Min(value = 1, message = "guidanceScale 不能小于 1")
@Max(value = 2, message = "guidanceScale 不能大于 2")
private Integer guidanceScale;
@Schema(description = "随机种子(默认 8888", example = "8888")
private Integer seed;
}

View File

@@ -0,0 +1,22 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import lombok.Data;
/**
* Latentsync 提交响应 VO
*/
@Data
public class AppTikLatentsyncSubmitRespVO {
@Schema(description = "Latentsync 任务 ID", example = "8eed0b9b-6103-4357-a57b-9f135a8c3276")
private String requestId;
@Schema(description = "官方状态,如 IN_QUEUE、PROCESSING、SUCCEEDED", example = "IN_QUEUE")
private String status;
@Schema(description = "当前排队位置", example = "0")
private Integer queuePosition;
}

View File

@@ -0,0 +1,38 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.NotNull;
import lombok.Data;
/**
* 用户 App - 创建配音 Request VO
*
* @author 芋道源码
*/
@Schema(description = "用户 App - 创建配音 Request VO")
@Data
public class AppTikUserVoiceCreateReqVO {
@Schema(description = "配音名称", requiredMode = Schema.RequiredMode.REQUIRED, example = "我的配音")
@NotBlank(message = "配音名称不能为空")
private String name;
@Schema(description = "音频文件编号(关联 infra_file.id", requiredMode = Schema.RequiredMode.REQUIRED, example = "1")
@NotNull(message = "音频文件编号不能为空")
private Long fileId;
@Schema(description = "是否自动识别", example = "false")
private Boolean autoTranscribe;
@Schema(description = "语言zh-CN-简体中文zh-TW-繁體中文en-US-English", example = "zh-CN")
private String language;
@Schema(description = "音色类型female-女声male-男声", example = "female")
private String gender;
@Schema(description = "备注", example = "这是一个测试配音")
private String note;
}

View File

@@ -0,0 +1,23 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import cn.iocoder.yudao.framework.common.pojo.PageParam;
import io.swagger.v3.oas.annotations.media.Schema;
import lombok.Data;
/**
* 用户 App - 用户配音分页 Request VO
*
* @author 芋道源码
*/
@Schema(description = "用户 App - 用户配音分页 Request VO")
@Data
public class AppTikUserVoicePageReqVO extends PageParam {
@Schema(description = "用户编号(自动填充,无需传递)")
private Long userId;
@Schema(description = "配音名称(模糊查询)", example = "我的配音")
private String name;
}

View File

@@ -0,0 +1,48 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import lombok.Data;
import java.time.LocalDateTime;
/**
* 用户 App - 用户配音 Response VO
*
* @author 芋道源码
*/
@Schema(description = "用户 App - 用户配音 Response VO")
@Data
public class AppTikUserVoiceRespVO {
@Schema(description = "配音编号", requiredMode = Schema.RequiredMode.REQUIRED, example = "1")
private Long id;
@Schema(description = "配音名称", requiredMode = Schema.RequiredMode.REQUIRED, example = "我的配音")
private String name;
@Schema(description = "音频文件编号(关联 infra_file.id", requiredMode = Schema.RequiredMode.REQUIRED, example = "1")
private Long fileId;
@Schema(description = "文件访问URL通过 file_id 关联查询获取)")
private String fileUrl;
@Schema(description = "语音识别内容", example = "这是识别出的文字内容")
private String transcription;
@Schema(description = "语言zh-CN-简体中文zh-TW-繁體中文en-US-English", example = "zh-CN")
private String language;
@Schema(description = "音色类型female-女声male-男声", example = "female")
private String gender;
@Schema(description = "备注", example = "这是一个测试配音")
private String note;
@Schema(description = "创建时间", requiredMode = Schema.RequiredMode.REQUIRED)
private LocalDateTime createTime;
@Schema(description = "更新时间", requiredMode = Schema.RequiredMode.REQUIRED)
private LocalDateTime updateTime;
}

View File

@@ -0,0 +1,36 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.NotNull;
import lombok.Data;
/**
* 用户 App - 更新配音 Request VO
*
* @author 芋道源码
*/
@Schema(description = "用户 App - 更新配音 Request VO")
@Data
public class AppTikUserVoiceUpdateReqVO {
@Schema(description = "配音编号", requiredMode = Schema.RequiredMode.REQUIRED, example = "1")
@NotNull(message = "配音编号不能为空")
private Long id;
@Schema(description = "配音名称", example = "我的配音")
private String name;
@Schema(description = "语言zh-CN-简体中文zh-TW-繁體中文en-US-English", example = "zh-CN")
private String language;
@Schema(description = "音色类型female-女声male-男声", example = "female")
private String gender;
@Schema(description = "备注", example = "这是一个测试配音")
private String note;
@Schema(description = "识别内容", example = "识别文字,可手动编辑")
private String transcription;
}

View File

@@ -0,0 +1,43 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.Size;
import lombok.Data;
/**
* 我的音色试听请求
*/
@Data
public class AppTikVoicePreviewReqVO {
@Schema(description = "输入文本")
@Size(max = 4000, message = "输入文本不能超过 4000 个字符")
private String inputText;
@Schema(description = "识别文本,用于拼接")
@Size(max = 4000, message = "识别文本不能超过 4000 个字符")
private String transcriptionText;
@Schema(description = "音色 IDCosyVoice voiceId")
private String voiceId;
@Schema(description = "音色源音频 OSS 地址(当没有 voiceId 时必传)")
private String fileUrl;
@Schema(description = "模型名称,默认 cosyvoice-v2")
private String model;
@Schema(description = "语速", example = "1.0")
private Float speechRate;
@Schema(description = "音量", example = "0")
private Float volume;
@Schema(description = "情感", example = "neutral")
private String emotion;
@Schema(description = "音频格式,默认 wav")
private String audioFormat;
}

View File

@@ -0,0 +1,26 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import lombok.Data;
@Data
@Schema(description = "音色试听响应")
public class AppTikVoicePreviewRespVO {
@Schema(description = "音频播放地址(预签名 URL")
private String audioUrl;
@Schema(description = "音频格式", example = "wav")
private String format;
@Schema(description = "采样率", example = "24000")
private Integer sampleRate;
@Schema(description = "CosyVoice 请求ID")
private String requestId;
@Schema(description = "使用的音色 ID")
private String voiceId;
}

View File

@@ -0,0 +1,46 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.Size;
import lombok.Data;
/**
* 文本转语音请求 VO
*/
@Data
public class AppTikVoiceTtsReqVO {
@Schema(description = "输入文本")
@Size(max = 4000, message = "输入文本不能超过 4000 个字符")
private String inputText;
@Schema(description = "识别文本,用于拼接")
@Size(max = 4000, message = "识别文本不能超过 4000 个字符")
private String transcriptionText;
@Schema(description = "音色 IDCosyVoice voiceId", example = "cosyvoice-v2-myvoice-xxx")
private String voiceId;
@Schema(description = "音色源音频 OSS 地址(当没有 voiceId 时必传)")
private String fileUrl;
@Schema(description = "模型名称,默认 cosyvoice-v2", example = "cosyvoice-v3")
private String model;
@Schema(description = "语速,默认 1.0", example = "1.0")
private Float speechRate;
@Schema(description = "情感", example = "happy")
private String emotion;
@Schema(description = "音量调节范围 [-10,10]", example = "0")
private Float volume;
@Schema(description = "目标采样率,默认 24000")
private Integer sampleRate;
@Schema(description = "音频格式,默认 wav可选 mp3")
private String audioFormat;
}

View File

@@ -0,0 +1,29 @@
package cn.iocoder.yudao.module.tik.voice.vo;
import io.swagger.v3.oas.annotations.media.Schema;
import lombok.Data;
@Data
@Schema(description = "CosyVoice 文本转语音响应")
public class AppTikVoiceTtsRespVO {
@Schema(description = "用户文件编号", example = "1024")
private Long fileId;
@Schema(description = "音频播放地址(预签名 URL")
private String audioUrl;
@Schema(description = "音频格式", example = "mp3")
private String format;
@Schema(description = "采样率", example = "24000")
private Integer sampleRate;
@Schema(description = "CosyVoice 请求ID")
private String requestId;
@Schema(description = "使用的音色 ID")
private String voiceId;
}

View File

@@ -0,0 +1,62 @@
package cn.iocoder.yudao.module.tik.voice.service;
import cn.iocoder.yudao.module.tik.voice.client.LatentsyncClient;
import cn.iocoder.yudao.module.tik.voice.client.dto.LatentsyncSubmitRequest;
import cn.iocoder.yudao.module.tik.voice.client.dto.LatentsyncSubmitResponse;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikLatentsyncSubmitReqVO;
import cn.iocoder.yudao.module.tik.voice.vo.AppTikLatentsyncSubmitRespVO;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.ArgumentCaptor;
import org.mockito.Mock;
import org.mockito.junit.jupiter.MockitoExtension;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;
@ExtendWith(MockitoExtension.class)
class LatentsyncServiceImplTest {
@Mock
private LatentsyncClient latentsyncClient;
private LatentsyncServiceImpl latentsyncService;
@BeforeEach
void setUp() {
latentsyncService = new LatentsyncServiceImpl(latentsyncClient);
}
@Test
void submitTask_success() {
AppTikLatentsyncSubmitReqVO reqVO = new AppTikLatentsyncSubmitReqVO();
reqVO.setAudioUrl("https://cdn.example.com/audio.wav");
reqVO.setVideoUrl("https://cdn.example.com/video.mp4");
reqVO.setGuidanceScale(2);
reqVO.setSeed(999);
LatentsyncSubmitResponse clientResp = new LatentsyncSubmitResponse();
clientResp.setRequestId("task-123");
clientResp.setStatus("IN_QUEUE");
clientResp.setQueuePosition(0);
when(latentsyncClient.submitTask(org.mockito.Mockito.any())).thenReturn(clientResp);
AppTikLatentsyncSubmitRespVO respVO = latentsyncService.submitTask(reqVO);
assertThat(respVO.getRequestId()).isEqualTo("task-123");
assertThat(respVO.getStatus()).isEqualTo("IN_QUEUE");
assertThat(respVO.getQueuePosition()).isZero();
ArgumentCaptor<LatentsyncSubmitRequest> captor = ArgumentCaptor.forClass(LatentsyncSubmitRequest.class);
verify(latentsyncClient).submitTask(captor.capture());
LatentsyncSubmitRequest submitRequest = captor.getValue();
assertThat(submitRequest.getAudioUrl()).isEqualTo(reqVO.getAudioUrl());
assertThat(submitRequest.getVideoUrl()).isEqualTo(reqVO.getVideoUrl());
assertThat(submitRequest.getGuidanceScale()).isEqualTo(reqVO.getGuidanceScale());
assertThat(submitRequest.getSeed()).isEqualTo(reqVO.getSeed());
}
}