diff --git a/openspec/changes/add-siliconflow-voice-provider/design.md b/openspec/changes/add-siliconflow-voice-provider/design.md new file mode 100644 index 0000000000..8b32bc4b20 --- /dev/null +++ b/openspec/changes/add-siliconflow-voice-provider/design.md @@ -0,0 +1,179 @@ +# Technical Design: SiliconFlow Voice Provider + +## Context + +硅基流动(SiliconFlow)是一个提供多种 AI 服务的平台,包括语音合成、语音克隆等功能。本次设计将其作为新的语音供应商集成到现有的多供应商架构中。 + +**约束条件**: +- 必须兼容现有的 `VoiceCloneProvider` 接口 +- 不能影响现有的 CosyVoice 供应商功能 +- 需要适配硅基流动的 API 差异 + +**关键 API 差异**: +1. **语音克隆**: 硅基流动需要先上传参考音频,返回 `uri` 作为音色 ID +2. **TTS 合成**: 使用 `/v1/audio/speech` 端点,返回二进制音频数据 +3. **认证**: 使用 Bearer Token 格式的 API Key +4. **模型**: 使用 `IndexTeam/IndexTTS-2` 模型 + +## Goals / Non-Goals + +### Goals +- 实现 `SiliconFlowProvider` 完整支持语音克隆和 TTS +- 支持硅基流动 `IndexTeam/IndexTTS-2` 模型 +- 提供完整的配置支持,可独立开关 +- 处理 API 差异,提供统一的服务接口 + +### Non-Goals +- 不实现语音转文字(STT)功能(已有其他服务) +- 不修改现有的 `VoiceCloneProvider` 接口定义 +- 不改变前端 API 契约 + +## Decisions + +### 1. 实现类结构 + +**架构**: +``` +SiliconFlowProvider (implements VoiceCloneProvider) +├── SiliconFlowApi (HTTP 客户端) +├── SiliconFlowProviderConfig (配置类) +└── DTO 类 (请求/响应适配) +``` + +**Why**: +- 遵循现有的 `CosyVoiceProvider` 模式 +- 分离 API 调用逻辑,便于测试和维护 +- 专用 DTO 处理硅基流动 API 差异 + +### 2. 语音克隆流程适配 + +**硅基流动语音克隆 API**: +- 端点: `POST /v1/uploads/audio/voice` +- 请求参数: `model`, `customName`, `text`, `audio` (base64) +- 响应: `{"uri": "speech:customName:xxx:xxx"}` + +**适配策略**: +1. 将统一请求的 `audioUrl` 下载并转换为 base64 +2. 使用 `prefix` 作为 `customName` +3. 使用 `audioUrl` 对应的转录文本作为 `text` 参数 +4. 返回的 `uri` 存储为 `voiceId` + +**代码示例**: +```java +@Override +public VoiceCloneResult cloneVoice(VoiceCloneRequest request) { + // 1. 下载音频文件 + byte[] audioData = downloadAudio(request.getAudioUrl()); + String base64Audio = Base64.getEncoder().encodeToString(audioData); + + // 2. 构建硅基流动请求 + SiliconFlowVoiceUploadRequest sfRequest = new SiliconFlowVoiceUploadRequest(); + sfRequest.setModel("IndexTeam/IndexTTS-2"); + sfRequest.setCustomName(request.getPrefix()); + sfRequest.setText(getTranscriptionText(request.getAudioUrl())); + sfRequest.setAudio("data:audio/mpeg;base64," + base64Audio); + + // 3. 调用 API + SiliconFlowVoiceUploadResponse sfResponse = siliconFlowApi.uploadVoice(sfRequest); + + // 4. 适配返回结果 + VoiceCloneResult result = new VoiceCloneResult(); + result.setVoiceId(sfResponse.getUri()); + return result; +} +``` + +### 3. TTS 合成流程适配 + +**硅基流动 TTS API**: +- 端点: `POST /v1/audio/speech` +- 请求参数: `model`, `input`, `voice`, `speed`, `sample_rate`, `response_format` +- 响应: 二进制音频数据 + +**适配策略**: +1. 使用 `voiceId` (uri 格式) 作为 `voice` 参数 +2. 支持语速调节 (`speed`) +3. 将二进制响应转换为 Base64 返回 + +### 4. 配置设计 + +**配置结构**: +```yaml +yudao: + voice: + providers: + siliconflow: + enabled: false + api-key: ${SILICONFLOW_API_KEY} + base-url: https://api.siliconflow.cn + default-model: IndexTeam/IndexTTS-2 + audio-format: mp3 + sample-rate: 24000 + connect-timeout: 10s + read-timeout: 180s +``` + +**配置类设计**: +```java +@Data +@EqualsAndHashCode(callSuper = true) +public class SiliconFlowProviderConfig extends VoiceProviderProperties.ProviderConfig { + private String baseUrl = "https://api.siliconflow.cn"; + private String defaultModel = "IndexTeam/IndexTTS-2"; + private String audioFormat = "mp3"; + private Integer sampleRate = 24000; + private Duration connectTimeout = Duration.ofSeconds(10); + private Duration readTimeout = Duration.ofSeconds(180); +} +``` + +### 5. 错误处理策略 + +- API 调用失败时记录详细日志 +- 统一转换为 `VOICE_TTS_FAILED` 业务异常 +- 不暴露硅基流动技术细节给上层 +- 支持重试机制(网络错误) + +## Risks / Trade-offs + +| Risk | Mitigation | +|------|------------| +| 硅基流动 API 变更 | 封装在独立的 API 客户端中,便于更新 | +| 语音克隆需要转录文本 | 在创建配音时已有转录流程,复用该文本 | +| 音频下载增加延迟 | 可考虑配置是否需要下载,或使用异步处理 | +| Base64 编码增加内存占用 | 限制音频文件大小(已有 50MB 限制) | + +## Migration Plan + +### 阶段一:后端实现 +1. 创建 `SiliconFlowApi` HTTP 客户端 +2. 创建 `SiliconFlowProviderConfig` 配置类 +3. 创建硅基流动专用 DTO 类 +4. 实现 `SiliconFlowProvider` +5. 更新 `application.yaml` 配置 + +### 阶段二:测试验证 +1. 单元测试:`SiliconFlowApi` 调用 +2. 单元测试:`SiliconFlowProvider` 适配逻辑 +3. 集成测试:语音克隆完整流程 +4. 集成测试:TTS 合成完整流程 + +### 阶段三:前端支持(已有基础) +1. 验证 `voiceConfig.js` 已支持 `siliconflow` 类型 +2. 验证 API 请求已支持 `providerType` 参数 + +### 回滚方案 +- 通过配置 `enabled: false` 禁用硅基流动 +- 删除 `SiliconFlowProvider` 相关代码 +- 恢复 `application.yaml` 配置 + +## Open Questions + +1. **Q**: 语音克隆时是否必须提供转录文本? + **A**: 硅基流动 API 需要 `text` 参数,使用配音创建时的转录文本 + +2. **Q**: 是否需要支持硅基流动的其他模型? + **A**: 本次仅支持 `IndexTeam/IndexTTS-2`,后续可扩展 + +3. **Q**: 音频下载失败如何处理? + **A**: 抛出业务异常,提示用户检查音频 URL diff --git a/openspec/changes/add-siliconflow-voice-provider/proposal.md b/openspec/changes/add-siliconflow-voice-provider/proposal.md new file mode 100644 index 0000000000..e548a50e19 --- /dev/null +++ b/openspec/changes/add-siliconflow-voice-provider/proposal.md @@ -0,0 +1,36 @@ +# Change: Add SiliconFlow Voice Provider + +## Why + +当前语音克隆功能已完成策略模式重构,支持多供应商架构。CosyVoice 供应商已实现并投入使用。为提供更多选择和降低对单一供应商的依赖,需要添加硅基流动(SiliconFlow)作为新的语音供应商,支持 IndexTeam/IndexTTS-2 模型的语音克隆和 TTS 合成。 + +## What Changes + +- **ADDED** 新增 `SiliconFlowProvider` 实现类,实现 `VoiceCloneProvider` 接口 +- **ADDED** 新增 `SiliconFlowProviderConfig` 配置类 +- **ADDED** 新增 `SiliconFlowApi` API 客户端类 +- **ADDED** 新增硅基流动专用 DTO 类 +- **MODIFIED** 更新 `VoiceProviderProperties` 支持硅基流动配置 +- **MODIFIED** 更新 `application.yaml` 添加硅基流动配置项 + +## Impact + +- **Affected specs**: + - `voice-clone` - 扩展支持新的语音供应商 +- **Affected code**: + - 新增 `yudao-module-tik/.../voice/client/SiliconFlowProvider.java` + - 新增 `yudao-module-tik/.../voice/client/SiliconFlowApi.java` + - 新增 `yudao-module-tik/.../voice/config/SiliconFlowProviderConfig.java` + - 新增 `yudao-module-tik/.../voice/client/dto/SiliconFlow*.java` (DTO 类) + - 更新 `yudao-server/src/main/resources/application.yaml` + +## Dependencies + +- 依赖已完成的多供应商架构重构(`VoiceCloneProvider` 接口和工厂模式) +- 硅基流动 API Key 需要在配置中提供 + +## Migration + +- 无需数据迁移,新功能为纯新增 +- 现有 CosyVoice 供应商功能不受影响 +- 硅基流动默认禁用,需通过配置启用 diff --git a/openspec/changes/add-siliconflow-voice-provider/specs/voice-clone/spec.md b/openspec/changes/add-siliconflow-voice-provider/specs/voice-clone/spec.md new file mode 100644 index 0000000000..187c0f7932 --- /dev/null +++ b/openspec/changes/add-siliconflow-voice-provider/specs/voice-clone/spec.md @@ -0,0 +1,117 @@ +## ADDED Requirements + +### Requirement: SiliconFlow 语音克隆支持 + +系统 MUST 支持使用硅基流动(SiliconFlow)作为语音克隆供应商,允许用户上传参考音频并生成可复用的音色 ID。 + +#### Scenario: 成功使用硅基流动克隆音色 + +- **GIVEN** 用户已上传参考音频文件并获得文件 URL +- **AND** 参考音频已成功转录为文本 +- **AND** 硅基流动供应商已启用并配置有效的 API Key +- **WHEN** 用户通过 API 发起语音克隆请求,指定 `providerType` 为 `siliconflow` +- **THEN** 系统应当下载参考音频文件并转换为 base64 格式 +- **AND** 调用硅基流动 `/v1/uploads/audio/voice` API,使用 `IndexTeam/IndexTTS-2` 模型 +- **AND** 将返回的 `uri`(格式如 `speech:customName:xxx:xxx`)存储为 `voiceId` +- **AND** 返回克隆成功的响应,包含生成的 `voiceId` + +#### Scenario: 硅基流动 API 调用失败 + +- **GIVEN** 硅基流动供应商已启用 +- **WHEN** 调用硅基流动 API 时发生网络错误或认证失败 +- **THEN** 系统应当记录详细的错误日志 +- **AND** 返回统一错误码 `VOICE_TTS_FAILED`,不暴露底层技术细节 + +#### Scenario: 硅基流动供应商未配置 + +- **GIVEN** 硅基流动供应商未启用或 API Key 未配置 +- **WHEN** 用户尝试使用硅基流动进行语音克隆 +- **THEN** 系统应当返回友好的错误提示,要求管理员先配置硅基流动 + +--- + +### Requirement: SiliconFlow 文本转语音支持 + +系统 MUST 支持使用硅基流动进行文本转语音合成,允许用户使用已克隆的音色 ID 或系统默认音色合成语音。 + +#### Scenario: 使用克隆音色合成语音 + +- **GIVEN** 用户已通过硅基流动成功克隆音色,获得 `voiceId` +- **AND** 硅基流动供应商已启用 +- **WHEN** 用户发起 TTS 请求,指定 `providerType` 为 `siliconflow` 和有效的 `voiceId` +- **THEN** 系统应当调用硅基流动 `/v1/audio/speech` API +- **AND** 使用 `IndexTeam/IndexTTS-2` 模型和指定的 `voiceId` +- **AND** 将返回的音频二进制数据转换为 base64 格式 +- **AND** 返回包含音频数据、格式和采样率的响应 + +#### Scenario: 使用默认音色合成语音 + +- **GIVEN** 硅基流动供应商已启用 +- **AND** 用户未指定 `voiceId` +- **WHEN** 用户发起 TTS 请求,指定 `providerType` 为 `siliconflow` +- **THEN** 系统应当使用硅基流动提供的默认音色进行合成 +- **AND** 返回合成结果 + +#### Scenario: TTS 合成参数支持 + +- **GIVEN** 硅基流动供应商已启用 +- **WHEN** 用户发起 TTS 请求,包含可选参数(语速、采样率、音频格式) +- **THEN** 系统应当将这些参数适配为硅基流动 API 格式 +- **AND** 支持的参数包括:`speed`(0.25-4.0)、`sample_rate`、`response_format`(mp3/wav/pcm) + +--- + +### Requirement: 供应商动态切换 + +系统 SHALL 支持在请求时动态指定语音供应商,无需重启服务。 + +#### Scenario: 通过 providerType 切换供应商 + +- **GIVEN** 系统已配置 CosyVoice 和 SiliconFlow 两个供应商 +- **AND** 默认供应商为 `cosyvoice` +- **WHEN** 用户在 API 请求中指定 `providerType` 为 `siliconflow` +- **THEN** 系统应当使用 `SiliconFlowProvider` 处理请求 +- **AND** 不影响其他使用默认供应商的请求 + +#### Scenario: 不支持的供应商类型 + +- **GIVEN** 系统仅配置了 CosyVoice 和 SiliconFlow 供应商 +- **WHEN** 用户指定 `providerType` 为不存在的值(如 `other`) +- **THEN** 系统应当返回错误提示 "不支持的语音克隆供应商: other" + +--- + +### Requirement: SiliconFlow 配置管理 + +系统 MUST 支持通过配置文件管理硅基流动供应商的启用状态和连接参数。 + +#### Scenario: 配置硅基流动供应商 + +- **GIVEN** 管理员希望在系统中启用硅基流动 +- **WHEN** 管理员在 `application.yaml` 中配置以下参数: + ```yaml + yudao: + voice: + providers: + siliconflow: + enabled: true + api-key: sk-xxxxx + base-url: https://api.siliconflow.cn + default-model: IndexTeam/IndexTTS-2 + ``` +- **THEN** 系统应当在启动时注册 `SiliconFlowProvider` +- **AND** 用户可以通过指定 `providerType` 为 `siliconflow` 使用该供应商 + +#### Scenario: 禁用硅基流动供应商 + +- **GIVEN** 硅基流动供应商已配置 +- **WHEN** 管理员将 `enabled` 设置为 `false` 或移除配置 +- **THEN** 系统启动时不应注册 `SiliconFlowProvider` +- **AND** 用户请求硅基流动服务时应当返回错误提示 + +#### Scenario: 向后兼容旧配置 + +- **GIVEN** 系统已从旧版本升级,存在 `yudao.cosyvoice.*` 配置 +- **WHEN** 系统启动时检测到旧配置 +- **THEN** 系统应当自动将旧配置迁移到 `yudao.voice.providers.cosyvoice.*` 结构 +- **AND** 优先使用新配置,旧配置作为 fallback diff --git a/openspec/changes/add-siliconflow-voice-provider/tasks.md b/openspec/changes/add-siliconflow-voice-provider/tasks.md new file mode 100644 index 0000000000..be6e0aeb14 --- /dev/null +++ b/openspec/changes/add-siliconflow-voice-provider/tasks.md @@ -0,0 +1,66 @@ +# Implementation Tasks + +## 1. 配置类实现 +- [ ] 1.1 创建 `SiliconFlowProviderConfig` 配置类 + - 继承 `VoiceProviderProperties.ProviderConfig` + - 添加硅基流动特有配置字段 + - 添加默认值和配置前缀 + +## 2. API 客户端实现 +- [ ] 2.1 创建 `SiliconFlowApi` HTTP 客户端 + - 实现 `uploadVoice()` 方法 - 上传参考音频 + - 实现 `synthesize()` 方法 - 文本转语音 + - 实现 `transcribe()` 方法 - 语音转文本(可选) + - 配置 RestTemplate/ WebClient + - 添加请求/响应日志 + +## 3. DTO 类实现 +- [ ] 3.1 创建 `SiliconFlowVoiceUploadRequest` - 上传参考音频请求 +- [ ] 3.2 创建 `SiliconFlowVoiceUploadResponse` - 上传参考音频响应 +- [ ] 3.3 创建 `SiliconFlowTtsRequest` - 文本转语音请求 +- [ ] 3.4 创建 `SiliconFlowTtsResponse` - 文本转语音响应(二进制处理) + +## 4. Provider 实现类 +- [ ] 4.1 创建 `SiliconFlowProvider` 实现类 + - 实现 `VoiceCloneProvider` 接口 + - 实现 `cloneVoice()` 方法 + - 实现 `synthesize()` 方法 + - 实现 `supports()` 方法 + - 实现 `getProviderType()` 方法 + - 添加 `@Component` 注解注册为 Spring Bean + +## 5. 配置文件更新 +- [ ] 5.1 更新 `application.yaml` + - 添加 `yudao.voice.providers.siliconflow` 配置节 + - 配置 API Key、base URL、模型等参数 + - 默认设置为 `enabled: false` + +## 6. 测试与验证 +- [ ] 6.1 编写单元测试 + - 测试 `SiliconFlowApi` HTTP 调用 + - 测试 `SiliconFlowProvider` 适配逻辑 + - Mock 硅基流动 API 响应 +- [ ] 6.2 集成测试 + - 测试语音克隆完整流程 + - 测试 TTS 合成完整流程 + - 测试供应商切换功能 +- [ ] 6.3 验证前端兼容性 + - 验证 `voiceConfig.js` 支持 `siliconflow` 类型 + - 验证 API 请求支持 `providerType` 参数 + +## 7. 文档与清理 +- [ ] 7.1 更新相关文档 + - 添加硅基流动配置说明 + - 添加硅基流动使用示例 +- [ ] 7.2 代码审查与清理 + - 检查代码规范 + - 移除调试代码 + - 确保日志级别正确 + +--- + +**总计**: 20 项任务 + +**预计工作量**: 2-3 天 + +**依赖**: 多供应商架构重构已完成 diff --git a/openspec/changes/refactor-voice-provider/design.md b/openspec/changes/refactor-voice-provider/design.md deleted file mode 100644 index b0f7a487a2..0000000000 --- a/openspec/changes/refactor-voice-provider/design.md +++ /dev/null @@ -1,133 +0,0 @@ -# Technical Design: Voice Clone Provider Refactoring - -## Context - -当前语音克隆功能直接依赖阿里云 CosyVoice 的 SDK 和 API。Service 层直接调用 `CosyVoiceClient`,导致: - -1. **强耦合**:无法轻松切换或添加其他供应商 -2. **测试困难**:难以 mock 外部依赖 -3. **扩展性差**:添加新供应商需要修改 Service 层 - -## Goals / Non-Goals - -### Goals -- 解耦 Service 层与具体供应商实现 -- 支持多供应商并存和动态切换 -- 保持现有功能完全兼容 -- 为添加硅基流动 IndexTTS-2 打下基础 - -### Non-Goals -- 不改变现有 API 行为 -- 不修改数据库结构 -- 不改变前端交互 - -## Decisions - -### 1. 采用策略模式 + 工厂模式 - -**Why**: -- 策略模式:定义统一接口,各供应商独立实现 -- 工厂模式:根据配置动态获取 Provider 实例 -- 符合开闭原则,扩展时无需修改现有代码 - -**架构**: -``` -VoiceCloneProvider (interface) -├── CosyVoiceProvider (impl) - 阿里云 CosyVoice (DashScope) -├── SiliconFlowProvider (impl) - 阶段二:硅基流动 IndexTTS-2 -└── VoiceCloneProviderFactory -``` - -**说明**: -- `CosyVoiceProvider` 对应阿里云 DashScope 的语音服务 -- 默认模型:`cosyvoice-v3-flash` -- 扩展时添加新的 Provider 实现 - -### 2. 统一 DTO 设计 - -**Why**: 屏蔽不同供应商的 API 差异 - -```java -// 统一请求 -VoiceCloneRequest { - String audioUrl; // 音频 URL - String prefix; // 音色前缀 - String targetModel; // 目标模型 -} - -// 统一响应 -VoiceCloneResult { - String voiceId; // 生成的音色 ID - String requestId; // 请求 ID -} -``` - -### 3. 配置结构设计 - -**新配置结构**: -```yaml -yudao: - voice: - # 默认供应商 - default-provider: cosyvoice - - # 供应商配置 - providers: - cosyvoice: # 阿里云 CosyVoice - enabled: true - api-key: ${DASHSCOPE_API_KEY} - default-model: cosyvoice-v3-flash - # ... 其他配置 - - siliconflow: # 阶段二添加 - enabled: false - api-key: ${SILICONFLOW_API_KEY} - base-url: https://api.siliconflow.cn - default-model: indextts-2 -``` - -**向后兼容**: -- 读取旧配置 `yudao.cosyvoice.*` 并合并到新结构 -- 优先使用新配置,旧配置作为 fallback - -### 4. 错误处理策略 - -- Provider 调用失败时,记录详细日志 -- 返回统一的业务异常 `VOICE_TTS_FAILED` -- 不暴露底层供应商的技术细节 - -## Risks / Trade-offs - -| Risk | Mitigation | -|------|------------| -| 破坏现有功能 | 充分测试,保持 DTO 兼容 | -| 配置迁移复杂 | 支持旧配置自动映射 | -| 性能开销 | 工厂缓存 Provider 实例 | - -## Migration Plan - -### 阶段一:CosyVoice 重构 -1. 创建接口和工厂 -2. 重构 CosyVoice 为 Provider 实现 -3. 更新 Service 层使用接口 -4. 测试验证 - -### 阶段二:添加 SiliconFlow -1. 实现 SiliconFlowProvider -2. 添加配置支持 -3. 集成测试 - -### 回滚方案 -- 保留原有配置支持 -- Feature Flag 控制新逻辑 - -## Open Questions - -1. **Q**: 是否需要支持运行时动态切换供应商? - **A**: 初期不支持,通过配置切换即可 - -2. **Q**: 是否需要 Provider 健康检查? - **A**: 阶段二考虑添加 - -3. **Q**: DTO 字段差异如何处理? - **A**: 使用公共字段,扩展字段放 `Map extensions` diff --git a/openspec/changes/refactor-voice-provider/proposal.md b/openspec/changes/refactor-voice-provider/proposal.md deleted file mode 100644 index 80d671ecd8..0000000000 --- a/openspec/changes/refactor-voice-provider/proposal.md +++ /dev/null @@ -1,35 +0,0 @@ -# Change: Refactor Voice Clone Provider - -## Why - -当前语音克隆功能直接依赖阿里云 CosyVoice 实现,代码强耦合,扩展性差。添加新供应商(如硅基流动 IndexTTS-2)需要修改 Service 层代码,违反开闭原则。 - -**说明**: CosyVoice 是阿里云的语音合成服务(DashScope 平台),支持语音克隆和 TTS。当前代码使用 `cosyvoice-v3-flash` 模型。 - -## What Changes - -- **ADDED** 引入策略模式,定义 `VoiceCloneProvider` 统一接口 -- **ADDED** 创建工厂类 `VoiceCloneProviderFactory` 管理多供应商 -- **MODIFIED** 将现有 `CosyVoiceClient` 改造为 `CosyVoiceProvider` -- **MODIFIED** 更新 `TikUserVoiceServiceImpl` 使用 Provider 接口 -- **ADDED** 新增配置类支持多供应商配置和切换 -- **BREAKING** 配置项从 `yudao.cosyvoice` 迁移到 `yudao.voice.providers` - -## Impact - -- **Affected specs**: - - `voice-clone` (新增能力规范) -- **Affected code**: - - `TikUserVoiceServiceImpl.java` - Service 层改为依赖注入 Provider - - `CosyVoiceClient.java` → `CosyVoiceProvider.java` - 重命名并实现接口 - - `CosyVoiceProperties.java` → `VoiceProviderProperties.java` - 配置结构重组 - - 新增 `VoiceCloneProvider.java` - 统一接口定义 - - 新增 `VoiceCloneProviderFactory.java` - 工厂类 - - 新增 `SiliconFlowProvider.java` - 硅基流动实现(阶段二) - -## Migration - -- 现有配置自动迁移:`yudao.cosyvoice.*` → `yudao.voice.providers.cosyvoice.*` -- 默认供应商保持为 `cosyvoice` -- 默认行为保持不变,向后兼容 -- 支持通过配置切换供应商:`yudao.voice.default-provider` diff --git a/openspec/changes/refactor-voice-provider/specs/voice-clone/spec.md b/openspec/changes/refactor-voice-provider/specs/voice-clone/spec.md deleted file mode 100644 index 2d9245e7b7..0000000000 --- a/openspec/changes/refactor-voice-provider/specs/voice-clone/spec.md +++ /dev/null @@ -1,132 +0,0 @@ -# Voice Clone Capability Specification - -## ADDED Requirements - -### Requirement: Provider Abstraction Layer -The system SHALL provide a unified provider abstraction layer for voice cloning services, supporting multiple vendors through a common interface. - -#### Scenario: Get provider by type -- **GIVEN** the system is configured with multiple voice clone providers -- **WHEN** requesting a provider by type -- **THEN** the system SHALL return the corresponding provider instance -- **AND** the provider SHALL implement the `VoiceCloneProvider` interface - -#### Scenario: Provider not found -- **GIVEN** the system is configured with a default provider -- **WHEN** requesting a non-existent provider type -- **THEN** the system SHALL fallback to the default provider -- **AND** log a warning message - -### Requirement: Voice Cloning -The system SHALL support voice cloning through the provider interface, accepting an audio file URL and returning a unique voice ID. - -#### Scenario: Successful voice cloning with CosyVoice -- **GIVEN** a valid CosyVoice provider is configured -- **WHEN** submitting a voice clone request with audio URL -- **THEN** the system SHALL return a voice ID -- **AND** the voice ID SHALL be usable for subsequent TTS synthesis - -#### Scenario: Voice cloning failure -- **GIVEN** the provider API is unavailable or returns an error -- **WHEN** submitting a voice clone request -- **THEN** the system SHALL throw a `VOICE_TTS_FAILED` exception -- **AND** log the error details for debugging - -### Requirement: Text-to-Speech Synthesis -The system SHALL support TTS synthesis through cloned voices or system voices, accepting text input and returning audio data. - -#### Scenario: TTS with cloned voice -- **GIVEN** a valid voice ID from a previous clone operation -- **WHEN** submitting a TTS request with text and voice ID -- **THEN** the system SHALL return audio data in the specified format -- **AND** the audio SHALL match the cloned voice characteristics - -#### Scenario: TTS with system voice -- **GIVEN** a system voice ID is configured -- **WHEN** submitting a TTS request with text and system voice ID -- **THEN** the system SHALL return audio data using the system voice -- **AND** the audio SHALL match the system voice characteristics - -#### Scenario: TTS with reference audio (file URL) -- **GIVEN** a reference audio URL and transcription text -- **WHEN** submitting a TTS request with file URL -- **THEN** the system SHALL perform on-the-fly voice cloning -- **AND** return audio data matching the reference voice - -### Requirement: Configuration Management -The system SHALL support multi-provider configuration through a unified configuration structure. - -#### Scenario: Configure multiple providers -- **GIVEN** the application configuration file -- **WHEN** configuring multiple voice providers -- **THEN** each provider SHALL have independent `enabled` flag -- **AND** the system SHALL only use enabled providers - -#### Scenario: Default provider selection -- **GIVEN** the configuration specifies a `default-provider` -- **WHEN** no provider is explicitly specified -- **THEN** the system SHALL use the default provider -- **AND** fallback to `cosyvoice` if default is not configured - -#### Scenario: Backward compatibility -- **GIVEN** existing configuration using `yudao.cosyvoice.*` -- **WHEN** the system starts -- **THEN** the system SHALL automatically migrate to new config structure -- **AND** existing functionality SHALL remain unchanged - -### Requirement: Provider Factory -The system SHALL provide a factory component for managing provider instances and resolving providers by type. - -#### Scenario: Factory resolves provider -- **GIVEN** the factory is initialized with provider configurations -- **WHEN** calling `factory.getProvider("cosyvoice")` -- **THEN** the factory SHALL return the CosyVoiceProvider instance -- **AND** cache the instance for subsequent requests - -#### Scenario: Factory returns default -- **GIVEN** the factory is configured with default provider -- **WHEN** calling `factory.getProvider(null)` -- **THEN** the factory SHALL return the default provider instance - -## MODIFIED Requirements - -### Requirement: Voice Creation Flow -The voice creation process SHALL use the provider abstraction layer instead of directly calling CosyVoice client. - -#### Scenario: Create voice with CosyVoice -- **GIVEN** a user uploads a voice audio file -- **WHEN** creating a voice configuration through the API -- **THEN** the system SHALL: - 1. Validate the file exists and belongs to voice category - 2. Call `provider.cloneVoice()` with the audio URL - 3. Store the returned `voiceId` in the database - 4. Return success response with voice configuration ID - -#### Scenario: Create voice with transcription -- **GIVEN** a voice configuration is created without transcription -- **WHEN** the user triggers transcription -- **THEN** the system SHALL: - 1. Fetch the audio file URL - 2. Call the transcription service - 3. Store the transcription text - 4. Update the voice configuration - -### Requirement: Voice Preview -The voice preview functionality SHALL work with both cloned voices (voiceId) and reference audio (file URL). - -#### Scenario: Preview cloned voice -- **GIVEN** a voice configuration with a valid `voiceId` -- **WHEN** requesting a preview with custom text -- **THEN** the system SHALL call `provider.synthesize()` with the voiceId -- **AND** return audio data in Base64 format - -#### Scenario: Preview with reference audio -- **GIVEN** a voice configuration without `voiceId` but with audio file -- **WHEN** requesting a preview -- **THEN** the system SHALL call `provider.synthesize()` with the file URL -- **AND** use the stored transcription as reference text -- **AND** return audio data in Base64 format - -## REMOVED Requirements - -None. This change is additive and refactoring only. diff --git a/openspec/changes/refactor-voice-provider/tasks.md b/openspec/changes/refactor-voice-provider/tasks.md deleted file mode 100644 index 5e25ca2f74..0000000000 --- a/openspec/changes/refactor-voice-provider/tasks.md +++ /dev/null @@ -1,53 +0,0 @@ -# Implementation Tasks - -## 1. 接口与基础结构 -- [ ] 1.1 创建 `VoiceCloneProvider` 接口 - - 定义 `cloneVoice(VoiceCloneRequest)` 方法 - - 定义 `synthesize(VoiceTtsRequest)` 方法 - - 定义 `supports(String providerType)` 方法 -- [ ] 1.2 创建统一 DTO 类 - - `VoiceCloneRequest` - 语音克隆请求 - - `VoiceCloneResult` - 语音克隆响应 - - `VoiceTtsRequest` - 语音合成请求 - - `VoiceTtsResult` - 语音合成响应 -- [ ] 1.3 创建 `VoiceCloneProviderFactory` 工厂类 - - 根据配置获取 Provider 实例 - - 支持动态切换供应商 - -## 2. CosyVoice 重构(保持现有功能) -- [ ] 2.1 重命名 `CosyVoiceClient` → `CosyVoiceProvider` -- [ ] 2.2 `CosyVoiceProvider` 实现 `VoiceCloneProvider` 接口 -- [ ] 2.3 适配现有 DTO 到新的统一 DTO -- [ ] 2.4 保持现有 DashScope SDK 调用逻辑不变 - -## 3. 配置重构 -- [ ] 3.1 创建 `VoiceProviderProperties` 配置类 - - 支持多供应商配置结构 - - 添加 `default-provider` 配置项 -- [ ] 3.2 创建 `CosyVoiceProviderConfig` (嵌套配置) -- [ ] 3.3 保持向后兼容:支持读取旧的 `yudao.cosyvoice.*` 配置 - -## 4. Service 层改造 -- [ ] 4.1 修改 `TikUserVoiceServiceImpl` - - 注入 `VoiceCloneProvider` 而非 `CosyVoiceClient` - - 使用工厂获取 Provider 实例 -- [ ] 4.2 更新方法调用 - - `createVoice()` - 使用 `provider.cloneVoice()` - - `synthesizeVoice()` - 使用 `provider.synthesize()` - - `previewVoice()` - 使用 `provider.synthesize()` - -## 5. 测试与验证 -- [ ] 5.1 单元测试:CosyVoiceProvider -- [ ] 5.2 单元测试:VoiceCloneProviderFactory -- [ ] 5.3 集成测试:TikUserVoiceServiceImpl -- [ ] 5.4 验证现有功能正常运行 - -## 6. 文档与配置迁移 -- [ ] 6.1 更新 `application.yaml` 配置示例 -- [ ] 6.2 添加配置迁移说明文档 - ---- - -**总计**: 20 项任务 - -**预计工作量**: 2-3 天 diff --git a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/CosyVoiceClient.java b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/CosyVoiceClient.java index 133b4a9321..b8af69d005 100644 --- a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/CosyVoiceClient.java +++ b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/CosyVoiceClient.java @@ -7,7 +7,7 @@ import cn.iocoder.yudao.module.tik.voice.client.dto.CosyVoiceCloneRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.CosyVoiceCloneResult; import cn.iocoder.yudao.module.tik.voice.client.dto.CosyVoiceTtsRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.CosyVoiceTtsResult; -import cn.iocoder.yudao.module.tik.voice.config.CosyVoiceProperties; +import cn.iocoder.yudao.module.tik.voice.config.CosyVoiceProviderConfig; import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam; import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer; import com.alibaba.dashscope.audio.ttsv2.enrollment.Voice; @@ -45,7 +45,7 @@ public class CosyVoiceClient { private static final MediaType JSON = MediaType.parse("application/json; charset=utf-8"); - private final CosyVoiceProperties properties; + private final CosyVoiceProviderConfig config; private final ObjectMapper objectMapper; private volatile OkHttpClient httpClient; @@ -54,7 +54,7 @@ public class CosyVoiceClient { * 调用 CosyVoice TTS 接口 */ public CosyVoiceTtsResult synthesize(CosyVoiceTtsRequest request) { - if (!properties.isEnabled()) { + if (!config.isEnabled()) { throw exception0(VOICE_TTS_FAILED.getCode(), "未配置 CosyVoice API Key"); } if (request == null || StrUtil.isBlank(request.getText())) { @@ -69,15 +69,15 @@ public class CosyVoiceClient { log.info("[CosyVoice][开始TTS][voiceId={}, textLength={}, model={}, speechRate={}, instruction={}]", request.getVoiceId(), request.getText().length(), - StrUtil.blankToDefault(request.getModel(), properties.getDefaultModel()), + StrUtil.blankToDefault(request.getModel(), config.getDefaultModel()), request.getSpeechRate(), request.getInstruction()); // 使用 DashScope SDK 构建参数(严格按文档) // 注意:speechRate 和 volume 需要转换为 int 类型 SpeechSynthesisParam param = SpeechSynthesisParam.builder() - .apiKey(properties.getApiKey()) - .model(StrUtil.blankToDefault(request.getModel(), properties.getDefaultModel())) + .apiKey(config.getApiKey()) + .model(StrUtil.blankToDefault(request.getModel(), config.getDefaultModel())) .voice(request.getVoiceId()) .speechRate(request.getSpeechRate() != null ? request.getSpeechRate().intValue() : 1) .volume(request.getVolume() != null ? request.getVolume().intValue() : 0) @@ -108,8 +108,8 @@ public class CosyVoiceClient { // 构建返回结果 CosyVoiceTtsResult result = new CosyVoiceTtsResult(); result.setAudio(audioBytes); - result.setFormat(request.getAudioFormat() != null ? request.getAudioFormat() : properties.getAudioFormat()); - result.setSampleRate(request.getSampleRate() != null ? request.getSampleRate() : properties.getSampleRate()); + result.setFormat(request.getAudioFormat() != null ? request.getAudioFormat() : config.getAudioFormat()); + result.setSampleRate(request.getSampleRate() != null ? request.getSampleRate() : config.getSampleRate()); result.setRequestId(synthesizer.getLastRequestId()); result.setVoiceId(request.getVoiceId()); @@ -138,8 +138,8 @@ public class CosyVoiceClient { private CosyVoiceTtsResult synthesizeViaHttp(CosyVoiceTtsRequest request) throws Exception { String payload = objectMapper.writeValueAsString(buildPayload(request)); Request httpRequest = new Request.Builder() - .url(properties.getTtsUrl()) - .addHeader("Authorization", "Bearer " + properties.getApiKey()) + .url(config.getTtsUrl()) + .addHeader("Authorization", "Bearer " + config.getApiKey()) .addHeader("Content-Type", "application/json") .post(RequestBody.create(payload.getBytes(StandardCharsets.UTF_8), JSON)) .build(); @@ -158,7 +158,7 @@ public class CosyVoiceClient { * 调用 CosyVoice 语音复刻接口(声音注册) */ public CosyVoiceCloneResult cloneVoice(CosyVoiceCloneRequest request) { - if (!properties.isEnabled()) { + if (!config.isEnabled()) { throw exception0(VOICE_TTS_FAILED.getCode(), "未配置 CosyVoice API Key"); } if (request == null || StrUtil.isBlank(request.getUrl())) { @@ -176,7 +176,7 @@ public class CosyVoiceClient { request.getTargetModel(), request.getPrefix(), request.getUrl()); // 使用 DashScope SDK 创建语音复刻 - VoiceEnrollmentService service = new VoiceEnrollmentService(properties.getApiKey()); + VoiceEnrollmentService service = new VoiceEnrollmentService(config.getApiKey()); Voice voice = service.createVoice(request.getTargetModel(), request.getPrefix(), request.getUrl()); log.info("[CosyVoice][语音复刻成功][Request ID: {}, Voice ID: {}]", @@ -199,7 +199,7 @@ public class CosyVoiceClient { private Map buildPayload(CosyVoiceTtsRequest request) { Map payload = new HashMap<>(); - String model = StrUtil.blankToDefault(request.getModel(), properties.getDefaultModel()); + String model = StrUtil.blankToDefault(request.getModel(), config.getDefaultModel()); payload.put("model", model); Map input = new HashMap<>(); @@ -218,7 +218,7 @@ public class CosyVoiceClient { } } else { // 使用系统音色 - String voiceId = StrUtil.blankToDefault(request.getVoiceId(), properties.getDefaultVoiceId()); + String voiceId = StrUtil.blankToDefault(request.getVoiceId(), config.getDefaultVoiceId()); if (StrUtil.isNotBlank(voiceId)) { input.put("voice", voiceId); log.info("[CosyVoice][使用系统音色][voice={}]", voiceId); @@ -229,11 +229,11 @@ public class CosyVoiceClient { payload.put("input", input); Map parameters = new HashMap<>(); - int sampleRate = request.getSampleRate() != null ? request.getSampleRate() : properties.getSampleRate(); + int sampleRate = request.getSampleRate() != null ? request.getSampleRate() : config.getSampleRate(); parameters.put("sample_rate", sampleRate); // 根据官方文档,统一使用小写格式 - String format = StrUtil.blankToDefault(request.getAudioFormat(), properties.getAudioFormat()).toLowerCase(); + String format = StrUtil.blankToDefault(request.getAudioFormat(), config.getAudioFormat()).toLowerCase(); parameters.put("format", format); if (request.getSpeechRate() != null) { @@ -280,8 +280,8 @@ public class CosyVoiceClient { byte[] audioBytes = Base64.getDecoder().decode(content); CosyVoiceTtsResult result = new CosyVoiceTtsResult(); result.setAudio(audioBytes); - result.setFormat(firstAudio.path("format").asText(StrUtil.blankToDefault(request.getAudioFormat(), properties.getAudioFormat()))); - result.setSampleRate(firstAudio.path("sample_rate").asInt(request.getSampleRate() != null ? request.getSampleRate() : properties.getSampleRate())); + result.setFormat(firstAudio.path("format").asText(StrUtil.blankToDefault(request.getAudioFormat(), config.getAudioFormat()))); + result.setSampleRate(firstAudio.path("sample_rate").asInt(request.getSampleRate() != null ? request.getSampleRate() : config.getSampleRate())); result.setRequestId(root.path("request_id").asText()); result.setVoiceId(firstAudio.path("voice").asText(request.getVoiceId())); return result; @@ -291,8 +291,8 @@ public class CosyVoiceClient { if (httpClient == null) { synchronized (this) { if (httpClient == null) { - java.time.Duration connect = defaultDuration(properties.getConnectTimeout(), 10); - java.time.Duration read = defaultDuration(properties.getReadTimeout(), 60); + java.time.Duration connect = defaultDuration(config.getConnectTimeout(), 10); + java.time.Duration read = defaultDuration(config.getReadTimeout(), 60); httpClient = new OkHttpClient.Builder() .connectTimeout(connect.toMillis(), TimeUnit.MILLISECONDS) .readTimeout(read.toMillis(), TimeUnit.MILLISECONDS) diff --git a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/CosyVoiceProvider.java b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/CosyVoiceProvider.java index 329c39f27f..4797cba7ff 100644 --- a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/CosyVoiceProvider.java +++ b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/CosyVoiceProvider.java @@ -1,11 +1,9 @@ package cn.iocoder.yudao.module.tik.voice.client; -import cn.hutool.core.util.StrUtil; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceCloneRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceCloneResult; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceTtsRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceTtsResult; -import cn.iocoder.yudao.module.tik.voice.config.CosyVoiceProperties; import cn.iocoder.yudao.module.tik.voice.config.CosyVoiceProviderConfig; import cn.iocoder.yudao.module.tik.voice.config.VoiceProviderProperties; import lombok.RequiredArgsConstructor; @@ -26,56 +24,19 @@ import org.springframework.stereotype.Component; public class CosyVoiceProvider implements VoiceCloneProvider { private final CosyVoiceClient cosyVoiceClient; - - /** - * 新配置(支持多供应商) - */ private final VoiceProviderProperties voiceProviderProperties; - /** - * 旧配置(向后兼容) - */ - private final CosyVoiceProperties cosyVoiceProperties; - /** * 获取 CosyVoice 配置 - * 优先使用新配置,如果不存在则使用旧配置(向后兼容) */ private CosyVoiceProviderConfig getConfig() { - // 尝试从新配置获取 var baseConfig = voiceProviderProperties.getProviderConfig("cosyvoice"); - if (baseConfig instanceof CosyVoiceProviderConfig cosyConfig) { - return cosyConfig; + if (baseConfig instanceof CosyVoiceProviderConfig config) { + return config; } - - // 回退到旧配置(向后兼容) - if (cosyVoiceProperties != null && cosyVoiceProperties.isEnabled()) { - return migrateFromLegacyConfig(cosyVoiceProperties); - } - - // 返回空配置 return new CosyVoiceProviderConfig(); } - /** - * 从旧配置迁移到新配置格式 - */ - private CosyVoiceProviderConfig migrateFromLegacyConfig(CosyVoiceProperties legacy) { - var config = new CosyVoiceProviderConfig(); - config.setEnabled(true); - config.setApiKey(legacy.getApiKey()); - config.setDefaultModel(legacy.getDefaultModel()); - config.setDefaultVoiceId(legacy.getDefaultVoiceId()); - config.setSampleRate(legacy.getSampleRate()); - config.setAudioFormat(legacy.getAudioFormat()); - config.setPreviewText(legacy.getPreviewText()); - config.setTtsUrl(legacy.getTtsUrl()); - config.setVoiceEnrollmentUrl(legacy.getVoiceEnrollmentUrl()); - config.setConnectTimeout(legacy.getConnectTimeout()); - config.setReadTimeout(legacy.getReadTimeout()); - return config; - } - @Override public VoiceCloneResult cloneVoice(VoiceCloneRequest request) { log.info("[CosyVoiceProvider][语音克隆][audioUrl={}, model={}]", diff --git a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/SiliconFlowApi.java b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/SiliconFlowApi.java deleted file mode 100644 index c3f7997e9c..0000000000 --- a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/SiliconFlowApi.java +++ /dev/null @@ -1,123 +0,0 @@ -package cn.iocoder.yudao.module.tik.voice.client; - -import cn.hutool.core.util.StrUtil; -import cn.hutool.http.HttpRequest; -import cn.hutool.http.HttpResponse; -import cn.hutool.json.JSONUtil; -import cn.iocoder.yudao.module.tik.voice.client.dto.SiliconFlowTtsRequest; -import cn.iocoder.yudao.module.tik.voice.client.dto.SiliconFlowVoiceUploadRequest; -import cn.iocoder.yudao.module.tik.voice.client.dto.SiliconFlowVoiceUploadResponse; -import cn.iocoder.yudao.module.tik.voice.config.SiliconFlowProviderConfig; -import lombok.RequiredArgsConstructor; -import lombok.extern.slf4j.Slf4j; -import org.springframework.http.MediaType; -import org.springframework.stereotype.Component; - -/** - * 硅基流动 API 客户端 - * - *

提供硅基流动语音服务的 HTTP 调用能力。 - * - * @author 芋道源码 - */ -@Slf4j -@Component -@RequiredArgsConstructor -public class SiliconFlowApi { - - private final SiliconFlowProviderConfig config; - - /** - * 上传参考音频(语音克隆) - * - * @param request 上传请求 - * @return 上传响应,包含音色 URI - */ - public SiliconFlowVoiceUploadResponse uploadVoice(SiliconFlowVoiceUploadRequest request) { - String url = config.getBaseUrl() + config.getVoiceUploadUrl(); - - log.info("[SiliconFlowApi][上传参考音频][url={}, model={}, customName={}]", - url, request.getModel(), request.getCustomName()); - - try { - String requestBody = JSONUtil.toJsonStr(request); - log.debug("[SiliconFlowApi][请求体]{}", requestBody); - - HttpResponse response = HttpRequest.post(url) - .header("Authorization", "Bearer " + config.getApiKey()) - .header("Content-Type", MediaType.APPLICATION_JSON_VALUE) - .body(requestBody) - .timeout((int) config.getConnectTimeout().toMillis()) - .execute(); - - String responseBody = response.body(); - log.debug("[SiliconFlowApi][响应体]{}", responseBody); - - if (!response.isOk()) { - log.error("[SiliconFlowApi][上传失败][code={}, body={}]", - response.getStatus(), responseBody); - throw new RuntimeException("硅基流动上传参考音频失败: " + responseBody); - } - - SiliconFlowVoiceUploadResponse result = JSONUtil.toBean(responseBody, - SiliconFlowVoiceUploadResponse.class); - - if (StrUtil.isBlank(result.getUri())) { - throw new RuntimeException("硅基流动上传参考音频失败: 响应中缺少 uri"); - } - - log.info("[SiliconFlowApi][上传成功][uri={}]", result.getUri()); - return result; - - } catch (Exception e) { - log.error("[SiliconFlowApi][上传异常]", e); - throw new RuntimeException("硅基流动上传参考音频异常: " + e.getMessage(), e); - } - } - - /** - * 文本转语音 - * - * @param request TTS 请求 - * @return 音频数据(base64 编码) - */ - public String synthesize(SiliconFlowTtsRequest request) { - String url = config.getBaseUrl() + config.getTtsUrl(); - - log.info("[SiliconFlowApi][文本转语音][url={}, model={}, inputLength={}]", - url, request.getModel(), - request.getInput() != null ? request.getInput().length() : 0); - - try { - String requestBody = JSONUtil.toJsonStr(request); - log.debug("[SiliconFlowApi][请求体]{}", requestBody); - - HttpResponse response = HttpRequest.post(url) - .header("Authorization", "Bearer " + config.getApiKey()) - .header("Content-Type", MediaType.APPLICATION_JSON_VALUE) - .body(requestBody) - .timeout((int) config.getReadTimeout().toMillis()) - .execute(); - - if (!response.isOk()) { - String errorBody = response.body(); - log.error("[SiliconFlowApi][合成失败][code={}, body={}]", - response.getStatus(), errorBody); - throw new RuntimeException("硅基流动文本转语音失败: " + errorBody); - } - - // 硅基流动直接返回二进制音频数据 - byte[] audioBytes = response.bodyBytes(); - String base64Audio = java.util.Base64.getEncoder().encodeToString(audioBytes); - - log.info("[SiliconFlowApi][合成成功][format={}, size={}]", - request.getResponseFormat(), audioBytes.length); - return base64Audio; - - } catch (Exception e) { - log.error("[SiliconFlowApi][合成异常]", e); - throw new RuntimeException("硅基流动文本转语音异常: " + e.getMessage(), e); - } - } - -} diff --git a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/SiliconFlowProvider.java b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/SiliconFlowProvider.java index 2bf804b8bf..1d4535298e 100644 --- a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/SiliconFlowProvider.java +++ b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/client/SiliconFlowProvider.java @@ -1,6 +1,9 @@ package cn.iocoder.yudao.module.tik.voice.client; import cn.hutool.core.util.StrUtil; +import cn.hutool.http.HttpRequest; +import cn.hutool.http.HttpResponse; +import cn.hutool.json.JSONUtil; import cn.iocoder.yudao.module.tik.voice.client.dto.SiliconFlowTtsRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.SiliconFlowVoiceUploadRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.SiliconFlowVoiceUploadResponse; @@ -9,9 +12,9 @@ import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceCloneResult; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceTtsRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceTtsResult; import cn.iocoder.yudao.module.tik.voice.config.SiliconFlowProviderConfig; -import cn.iocoder.yudao.module.tik.voice.config.VoiceProviderProperties; import lombok.RequiredArgsConstructor; import lombok.extern.slf4j.Slf4j; +import org.springframework.http.MediaType; import org.springframework.stereotype.Component; import java.io.ByteArrayOutputStream; @@ -23,7 +26,6 @@ import java.util.Base64; * 硅基流动 Provider 实现 * *

硅基流动语音服务的 Provider 实现。 - * 内部委托给 {@link SiliconFlowApi} 进行实际的API调用。 * * @author 芋道源码 */ @@ -35,29 +37,17 @@ public class SiliconFlowProvider implements VoiceCloneProvider { private static final String PROVIDER_TYPE = "siliconflow"; private static final String AUDIO_MIME_TYPE = "data:audio/mpeg;base64,"; - private final SiliconFlowApi siliconFlowApi; - private final VoiceProviderProperties voiceProviderProperties; - - /** - * 获取硅基流动配置 - */ - private SiliconFlowProviderConfig getConfig() { - var baseConfig = voiceProviderProperties.getProviderConfig("siliconflow"); - if (baseConfig instanceof SiliconFlowProviderConfig config) { - return config; - } - - // 返回默认配置 - return new SiliconFlowProviderConfig(); - } + private final SiliconFlowProviderConfig config; @Override public VoiceCloneResult cloneVoice(VoiceCloneRequest request) { + if (!config.isAvailable()) { + throw new RuntimeException("硅基流动供应商未配置或已禁用"); + } + log.info("[SiliconFlowProvider][语音克隆][audioUrl={}, model={}]", request.getAudioUrl(), request.getModel()); - SiliconFlowProviderConfig config = getConfig(); - try { byte[] audioData = downloadAudio(request.getAudioUrl()); String base64Audio = Base64.getEncoder().encodeToString(audioData); @@ -68,7 +58,33 @@ public class SiliconFlowProvider implements VoiceCloneProvider { sfRequest.setText(getOrDefault(request.getTranscriptionText(), config.getPreviewText())); sfRequest.setAudio(AUDIO_MIME_TYPE + base64Audio); - SiliconFlowVoiceUploadResponse sfResponse = siliconFlowApi.uploadVoice(sfRequest); + // 调用上传参考音频 API + String url = config.getBaseUrl() + config.getVoiceUploadUrl(); + String requestBody = JSONUtil.toJsonStr(sfRequest); + log.debug("[SiliconFlowProvider][请求体]{}", requestBody); + + HttpResponse response = HttpRequest.post(url) + .header("Authorization", "Bearer " + config.getApiKey()) + .header("Content-Type", MediaType.APPLICATION_JSON_VALUE) + .body(requestBody) + .timeout((int) config.getConnectTimeout().toMillis()) + .execute(); + + String responseBody = response.body(); + log.debug("[SiliconFlowProvider][响应体]{}", responseBody); + + if (!response.isOk()) { + log.error("[SiliconFlowProvider][上传失败][code={}, body={}]", + response.getStatus(), responseBody); + throw new RuntimeException("硅基流动上传参考音频失败: " + responseBody); + } + + SiliconFlowVoiceUploadResponse sfResponse = JSONUtil.toBean(responseBody, + SiliconFlowVoiceUploadResponse.class); + + if (StrUtil.isBlank(sfResponse.getUri())) { + throw new RuntimeException("硅基流动上传参考音频失败: 响应中缺少 uri"); + } VoiceCloneResult result = new VoiceCloneResult(); result.setVoiceId(sfResponse.getUri()); @@ -89,13 +105,15 @@ public class SiliconFlowProvider implements VoiceCloneProvider { @Override public VoiceTtsResult synthesize(VoiceTtsRequest request) { + if (!config.isAvailable()) { + throw new RuntimeException("硅基流动供应商未配置或已禁用"); + } + log.info("[SiliconFlowProvider][语音合成][voiceId={}, textLength={}, model={}]", request.getVoiceId(), request.getText() != null ? request.getText().length() : 0, request.getModel()); - SiliconFlowProviderConfig config = getConfig(); - try { SiliconFlowTtsRequest sfRequest = SiliconFlowTtsRequest.builder() .model(getOrDefault(request.getModel(), config.getDefaultModel())) @@ -106,16 +124,37 @@ public class SiliconFlowProvider implements VoiceCloneProvider { .responseFormat(getOrDefault(request.getAudioFormat(), config.getAudioFormat())) .build(); - String base64Audio = siliconFlowApi.synthesize(sfRequest); + // 调用文本转语音 API + String url = config.getBaseUrl() + config.getTtsUrl(); + String requestBody = JSONUtil.toJsonStr(sfRequest); + log.debug("[SiliconFlowProvider][请求体]{}", requestBody); + + HttpResponse response = HttpRequest.post(url) + .header("Authorization", "Bearer " + config.getApiKey()) + .header("Content-Type", MediaType.APPLICATION_JSON_VALUE) + .body(requestBody) + .timeout((int) config.getReadTimeout().toMillis()) + .execute(); + + if (!response.isOk()) { + String errorBody = response.body(); + log.error("[SiliconFlowProvider][合成失败][code={}, body={}]", + response.getStatus(), errorBody); + throw new RuntimeException("硅基流动文本转语音失败: " + errorBody); + } + + // 硅基流动直接返回二进制音频数据 + byte[] audioBytes = response.bodyBytes(); + String base64Audio = Base64.getEncoder().encodeToString(audioBytes); VoiceTtsResult result = new VoiceTtsResult(); - result.setAudio(base64Audio); + result.setAudio(Base64.getDecoder().decode(base64Audio)); result.setFormat(sfRequest.getResponseFormat()); result.setSampleRate(sfRequest.getSampleRate()); result.setVoiceId(request.getVoiceId()); log.info("[SiliconFlowProvider][语音合成成功][format={}, audioSize={}]", - result.getFormat(), base64Audio != null ? base64Audio.length() : 0); + result.getFormat(), result.getAudio() != null ? result.getAudio().length : 0); return result; } catch (Exception e) { diff --git a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/config/CosyVoiceProperties.java b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/config/CosyVoiceProperties.java deleted file mode 100644 index f795a15ad2..0000000000 --- a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/config/CosyVoiceProperties.java +++ /dev/null @@ -1,79 +0,0 @@ -package cn.iocoder.yudao.module.tik.voice.config; - -import cn.hutool.core.util.StrUtil; -import lombok.Data; -import org.springframework.boot.context.properties.ConfigurationProperties; -import org.springframework.stereotype.Component; - -import java.time.Duration; - -/** - * CosyVoice 配置 - */ -@Data -@Component -@ConfigurationProperties(prefix = "yudao.cosyvoice") -public class CosyVoiceProperties { - - /** - * DashScope API Key - */ - private String apiKey; - - /** - * 默认模型 - */ - private String defaultModel = "cosyvoice-v3-flash"; - - /** - * 默认 voiceId(可选) - */ - private String defaultVoiceId; - - /** - * 默认采样率 - */ - private Integer sampleRate = 24000; - - /** - * 默认音频格式 - */ - private String audioFormat = "mp3"; - - /** - * 试听默认示例文本 - */ - private String previewText = "您好,欢迎体验专属音色。"; - - /** - * TTS 接口地址 - */ - private String ttsUrl = "https://dashscope.aliyuncs.com/api/v1/services/audio/tts/speech-synthesis"; - - /** - * 语音复刻接口地址(声音注册) - */ - private String voiceEnrollmentUrl = "https://dashscope.aliyuncs.com/api/v1/services/audio/tts/voice-enrollment"; - - /** - * 连接超时时间 - */ - private Duration connectTimeout = Duration.ofSeconds(10); - - /** - * 读取超时时间(改为3分钟,提升语音合成成功率) - */ - private Duration readTimeout = Duration.ofSeconds(180); - - /** - * 是否启用 - */ - private boolean enabled = true; - - public boolean isEnabled() { - return enabled && StrUtil.isNotBlank(apiKey); - } - -} - - diff --git a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/config/SiliconFlowProviderConfig.java b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/config/SiliconFlowProviderConfig.java index ac81e29c83..cf072f9bf8 100644 --- a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/config/SiliconFlowProviderConfig.java +++ b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/config/SiliconFlowProviderConfig.java @@ -2,6 +2,8 @@ package cn.iocoder.yudao.module.tik.voice.config; import lombok.Data; import lombok.EqualsAndHashCode; +import org.springframework.boot.context.properties.ConfigurationProperties; +import org.springframework.stereotype.Component; import java.time.Duration; @@ -14,6 +16,8 @@ import java.time.Duration; */ @Data @EqualsAndHashCode(callSuper = true) +@Component +@ConfigurationProperties(prefix = "yudao.voice.siliconflow") public class SiliconFlowProviderConfig extends VoiceProviderProperties.ProviderConfig { /** @@ -61,4 +65,11 @@ public class SiliconFlowProviderConfig extends VoiceProviderProperties.ProviderC */ private Duration readTimeout = Duration.ofSeconds(180); + /** + * 检查是否可用(有 API Key 即可用) + */ + public boolean isAvailable() { + return isEnabled() && getApiKey() != null && !getApiKey().isEmpty(); + } + } diff --git a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/service/TikUserVoiceServiceImpl.java b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/service/TikUserVoiceServiceImpl.java index 43c7a35a7e..03bd1f5a0e 100644 --- a/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/service/TikUserVoiceServiceImpl.java +++ b/yudao-module-tik/src/main/java/cn/iocoder/yudao/module/tik/voice/service/TikUserVoiceServiceImpl.java @@ -25,7 +25,6 @@ import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceCloneRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceCloneResult; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceTtsRequest; import cn.iocoder.yudao.module.tik.voice.client.dto.VoiceTtsResult; -import cn.iocoder.yudao.module.tik.voice.config.CosyVoiceProperties; import cn.iocoder.yudao.module.tik.voice.config.VoiceProviderProperties; import cn.iocoder.yudao.module.tik.voice.dal.dataobject.TikUserVoiceDO; import cn.iocoder.yudao.module.tik.voice.dal.mysql.TikUserVoiceMapper; @@ -88,9 +87,6 @@ public class TikUserVoiceServiceImpl implements TikUserVoiceService { @Resource private VoiceCloneProviderFactory voiceProviderFactory; - @Resource - private CosyVoiceProperties cosyVoiceProperties; - @Resource private VoiceProviderProperties voiceProviderProperties; @@ -649,8 +645,7 @@ public class TikUserVoiceServiceImpl implements TikUserVoiceService { } /** - * 获取 CosyVoice 配置(统一入口) - * 优先使用新配置,回退到旧配置 + * 获取 CosyVoice 配置 */ private cn.iocoder.yudao.module.tik.voice.config.CosyVoiceProviderConfig getCosyVoiceConfig() { if (voiceProviderProperties != null) { @@ -664,31 +659,23 @@ public class TikUserVoiceServiceImpl implements TikUserVoiceService { /** * 获取默认音频格式 - * 优先使用新配置,回退到旧配置 */ private String getDefaultFormat() { var config = getCosyVoiceConfig(); if (config != null) { return config.getAudioFormat(); } - if (cosyVoiceProperties != null) { - return cosyVoiceProperties.getAudioFormat(); - } return "mp3"; } /** * 获取默认采样率 - * 优先使用新配置,回退到旧配置 */ private Integer getDefaultSampleRate() { var config = getCosyVoiceConfig(); if (config != null) { return config.getSampleRate(); } - if (cosyVoiceProperties != null) { - return cosyVoiceProperties.getSampleRate(); - } return 24000; } @@ -1173,31 +1160,23 @@ public class TikUserVoiceServiceImpl implements TikUserVoiceService { /** * 获取默认音色ID - * 优先使用新配置,回退到旧配置 */ private String getDefaultVoiceId() { var config = getCosyVoiceConfig(); if (config != null) { return config.getDefaultVoiceId(); } - if (cosyVoiceProperties != null) { - return cosyVoiceProperties.getDefaultVoiceId(); - } return null; } /** * 获取试听文本 - * 优先使用新配置,回退到旧配置 */ private String getPreviewText() { var config = getCosyVoiceConfig(); if (config != null) { return config.getPreviewText(); } - if (cosyVoiceProperties != null) { - return cosyVoiceProperties.getPreviewText(); - } return "您好,欢迎体验专属音色。"; } diff --git a/yudao-server/src/main/resources/application.yaml b/yudao-server/src/main/resources/application.yaml index 12f5a41874..a134437bd5 100644 --- a/yudao-server/src/main/resources/application.yaml +++ b/yudao-server/src/main/resources/application.yaml @@ -213,31 +213,25 @@ spring: sse-endpoint: /sse yudao: - cosyvoice: - enabled: true - api-key: sk-10c746f8cb8640738f8d6b71af699003 - default-model: cosyvoice-v3-flash - sample-rate: 24000 - audio-format: mp3 - preview-text: 您好,欢迎体验专属音色 voice: default-provider: cosyvoice - providers: - cosyvoice: - enabled: true - api-key: sk-10c746f8cb8640738f8d6b71af699003 - default-model: cosyvoice-v3-flash - sample-rate: 24000 - audio-format: mp3 - preview-text: 您好,欢迎体验专属音色 - siliconflow: - enabled: false - api-key: sk-kcvifijrafkzxsmnxbgxspnxdvjiaawcbyoiqhmfobykynpx - base-url: https://api.siliconflow.cn - default-model: IndexTeam/IndexTTS-2 - sample-rate: 24000 - audio-format: mp3 - preview-text: 您好,欢迎体验专属音色 + cosyvoice: + enabled: true + api-key: sk-10c746f8cb8640738f8d6b71af699003 + default-model: cosyvoice-v3-flash + sample-rate: 24000 + audio-format: mp3 + preview-text: 您好,欢迎体验专属音色 + tts-url: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/speech-synthesis + voice-enrollment-url: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/voice-enrollment + siliconflow: + enabled: false + api-key: ${SILICONFLOW_API_KEY:} + base-url: https://api.siliconflow.cn + default-model: IndexTeam/IndexTTS-2 + sample-rate: 24000 + audio-format: mp3 + preview-text: 您好,欢迎体验专属音色 ai: gemini: # 谷歌 Gemini enable: true