feat: 功能优化

2026-01-27 01:39:08 +08:00
parent bf12e70339
commit 24f66c8e81
24 changed files with 1570 additions and 133 deletions
--- a/openspec/changes/refactor-voice-provider/design.md
+++ b/openspec/changes/refactor-voice-provider/design.md
@@ -0,0 +1,133 @@
+# Technical Design: Voice Clone Provider Refactoring
+
+## Context
+
+当前语音克隆功能直接依赖阿里云 CosyVoice 的 SDK 和 API。Service 层直接调用 `CosyVoiceClient`，导致：
+
+1. **强耦合**：无法轻松切换或添加其他供应商
+2. **测试困难**：难以 mock 外部依赖
+3. **扩展性差**：添加新供应商需要修改 Service 层
+
+## Goals / Non-Goals
+
+### Goals
+- 解耦 Service 层与具体供应商实现
+- 支持多供应商并存和动态切换
+- 保持现有功能完全兼容
+- 为添加硅基流动 IndexTTS-2 打下基础
+
+### Non-Goals
+- 不改变现有 API 行为
+- 不修改数据库结构
+- 不改变前端交互
+
+## Decisions
+
+### 1. 采用策略模式 + 工厂模式
+
+**Why**:
+- 策略模式：定义统一接口，各供应商独立实现
+- 工厂模式：根据配置动态获取 Provider 实例
+- 符合开闭原则，扩展时无需修改现有代码
+
+**架构**:
+```
+VoiceCloneProvider (interface)
+├── CosyVoiceProvider (impl) - 阿里云 CosyVoice (DashScope)
+├── SiliconFlowProvider (impl) - 阶段二：硅基流动 IndexTTS-2
+└── VoiceCloneProviderFactory
+```
+
+**说明**:
+- `CosyVoiceProvider` 对应阿里云 DashScope 的语音服务
+- 默认模型：`cosyvoice-v3-flash`
+- 扩展时添加新的 Provider 实现
+
+### 2. 统一 DTO 设计
+
+**Why**: 屏蔽不同供应商的 API 差异
+
+```java
+// 统一请求
+VoiceCloneRequest {
+    String audioUrl;        // 音频 URL
+    String prefix;          // 音色前缀
+    String targetModel;     // 目标模型
+}
+
+// 统一响应
+VoiceCloneResult {
+    String voiceId;         // 生成的音色 ID
+    String requestId;       // 请求 ID
+}
+```
+
+### 3. 配置结构设计
+
+**新配置结构**:
+```yaml
+yudao:
+  voice:
+    # 默认供应商
+    default-provider: cosyvoice
+
+    # 供应商配置
+    providers:
+      cosyvoice:  # 阿里云 CosyVoice
+        enabled: true
+        api-key: ${DASHSCOPE_API_KEY}
+        default-model: cosyvoice-v3-flash
+        # ... 其他配置
+
+      siliconflow:  # 阶段二添加
+        enabled: false
+        api-key: ${SILICONFLOW_API_KEY}
+        base-url: https://api.siliconflow.cn
+        default-model: indextts-2
+```
+
+**向后兼容**:
+- 读取旧配置 `yudao.cosyvoice.*` 并合并到新结构
+- 优先使用新配置，旧配置作为 fallback
+
+### 4. 错误处理策略
+
+- Provider 调用失败时，记录详细日志
+- 返回统一的业务异常 `VOICE_TTS_FAILED`
+- 不暴露底层供应商的技术细节
+
+## Risks / Trade-offs
+
+| Risk | Mitigation |
+|------|------------|
+| 破坏现有功能 | 充分测试，保持 DTO 兼容 |
+| 配置迁移复杂 | 支持旧配置自动映射 |
+| 性能开销 | 工厂缓存 Provider 实例 |
+
+## Migration Plan
+
+### 阶段一：CosyVoice 重构
+1. 创建接口和工厂
+2. 重构 CosyVoice 为 Provider 实现
+3. 更新 Service 层使用接口
+4. 测试验证
+
+### 阶段二：添加 SiliconFlow
+1. 实现 SiliconFlowProvider
+2. 添加配置支持
+3. 集成测试
+
+### 回滚方案
+- 保留原有配置支持
+- Feature Flag 控制新逻辑
+
+## Open Questions
+
+1. **Q**: 是否需要支持运行时动态切换供应商？
+   **A**: 初期不支持，通过配置切换即可
+
+2. **Q**: 是否需要 Provider 健康检查？
+   **A**: 阶段二考虑添加
+
+3. **Q**: DTO 字段差异如何处理？
+   **A**: 使用公共字段，扩展字段放 `Map<String, Object> extensions`
--- a/openspec/changes/refactor-voice-provider/proposal.md
+++ b/openspec/changes/refactor-voice-provider/proposal.md
@@ -0,0 +1,35 @@
+# Change: Refactor Voice Clone Provider
+
+## Why
+
+当前语音克隆功能直接依赖阿里云 CosyVoice 实现，代码强耦合，扩展性差。添加新供应商（如硅基流动 IndexTTS-2）需要修改 Service 层代码，违反开闭原则。
+
+**说明**: CosyVoice 是阿里云的语音合成服务（DashScope 平台），支持语音克隆和 TTS。当前代码使用 `cosyvoice-v3-flash` 模型。
+
+## What Changes
+
+- **ADDED** 引入策略模式，定义 `VoiceCloneProvider` 统一接口
+- **ADDED** 创建工厂类 `VoiceCloneProviderFactory` 管理多供应商
+- **MODIFIED** 将现有 `CosyVoiceClient` 改造为 `CosyVoiceProvider`
+- **MODIFIED** 更新 `TikUserVoiceServiceImpl` 使用 Provider 接口
+- **ADDED** 新增配置类支持多供应商配置和切换
+- **BREAKING** 配置项从 `yudao.cosyvoice` 迁移到 `yudao.voice.providers`
+
+## Impact
+
+- **Affected specs**:
+  - `voice-clone` (新增能力规范)
+- **Affected code**:
+  - `TikUserVoiceServiceImpl.java` - Service 层改为依赖注入 Provider
+  - `CosyVoiceClient.java` → `CosyVoiceProvider.java` - 重命名并实现接口
+  - `CosyVoiceProperties.java` → `VoiceProviderProperties.java` - 配置结构重组
+  - 新增 `VoiceCloneProvider.java` - 统一接口定义
+  - 新增 `VoiceCloneProviderFactory.java` - 工厂类
+  - 新增 `SiliconFlowProvider.java` - 硅基流动实现（阶段二）
+
+## Migration
+
+- 现有配置自动迁移：`yudao.cosyvoice.*` → `yudao.voice.providers.cosyvoice.*`
+- 默认供应商保持为 `cosyvoice`
+- 默认行为保持不变，向后兼容
+- 支持通过配置切换供应商：`yudao.voice.default-provider`
--- a/openspec/changes/refactor-voice-provider/specs/voice-clone/spec.md
+++ b/openspec/changes/refactor-voice-provider/specs/voice-clone/spec.md
@@ -0,0 +1,132 @@
+# Voice Clone Capability Specification
+
+## ADDED Requirements
+
+### Requirement: Provider Abstraction Layer
+The system SHALL provide a unified provider abstraction layer for voice cloning services, supporting multiple vendors through a common interface.
+
+#### Scenario: Get provider by type
+- **GIVEN** the system is configured with multiple voice clone providers
+- **WHEN** requesting a provider by type
+- **THEN** the system SHALL return the corresponding provider instance
+- **AND** the provider SHALL implement the `VoiceCloneProvider` interface
+
+#### Scenario: Provider not found
+- **GIVEN** the system is configured with a default provider
+- **WHEN** requesting a non-existent provider type
+- **THEN** the system SHALL fallback to the default provider
+- **AND** log a warning message
+
+### Requirement: Voice Cloning
+The system SHALL support voice cloning through the provider interface, accepting an audio file URL and returning a unique voice ID.
+
+#### Scenario: Successful voice cloning with CosyVoice
+- **GIVEN** a valid CosyVoice provider is configured
+- **WHEN** submitting a voice clone request with audio URL
+- **THEN** the system SHALL return a voice ID
+- **AND** the voice ID SHALL be usable for subsequent TTS synthesis
+
+#### Scenario: Voice cloning failure
+- **GIVEN** the provider API is unavailable or returns an error
+- **WHEN** submitting a voice clone request
+- **THEN** the system SHALL throw a `VOICE_TTS_FAILED` exception
+- **AND** log the error details for debugging
+
+### Requirement: Text-to-Speech Synthesis
+The system SHALL support TTS synthesis through cloned voices or system voices, accepting text input and returning audio data.
+
+#### Scenario: TTS with cloned voice
+- **GIVEN** a valid voice ID from a previous clone operation
+- **WHEN** submitting a TTS request with text and voice ID
+- **THEN** the system SHALL return audio data in the specified format
+- **AND** the audio SHALL match the cloned voice characteristics
+
+#### Scenario: TTS with system voice
+- **GIVEN** a system voice ID is configured
+- **WHEN** submitting a TTS request with text and system voice ID
+- **THEN** the system SHALL return audio data using the system voice
+- **AND** the audio SHALL match the system voice characteristics
+
+#### Scenario: TTS with reference audio (file URL)
+- **GIVEN** a reference audio URL and transcription text
+- **WHEN** submitting a TTS request with file URL
+- **THEN** the system SHALL perform on-the-fly voice cloning
+- **AND** return audio data matching the reference voice
+
+### Requirement: Configuration Management
+The system SHALL support multi-provider configuration through a unified configuration structure.
+
+#### Scenario: Configure multiple providers
+- **GIVEN** the application configuration file
+- **WHEN** configuring multiple voice providers
+- **THEN** each provider SHALL have independent `enabled` flag
+- **AND** the system SHALL only use enabled providers
+
+#### Scenario: Default provider selection
+- **GIVEN** the configuration specifies a `default-provider`
+- **WHEN** no provider is explicitly specified
+- **THEN** the system SHALL use the default provider
+- **AND** fallback to `cosyvoice` if default is not configured
+
+#### Scenario: Backward compatibility
+- **GIVEN** existing configuration using `yudao.cosyvoice.*`
+- **WHEN** the system starts
+- **THEN** the system SHALL automatically migrate to new config structure
+- **AND** existing functionality SHALL remain unchanged
+
+### Requirement: Provider Factory
+The system SHALL provide a factory component for managing provider instances and resolving providers by type.
+
+#### Scenario: Factory resolves provider
+- **GIVEN** the factory is initialized with provider configurations
+- **WHEN** calling `factory.getProvider("cosyvoice")`
+- **THEN** the factory SHALL return the CosyVoiceProvider instance
+- **AND** cache the instance for subsequent requests
+
+#### Scenario: Factory returns default
+- **GIVEN** the factory is configured with default provider
+- **WHEN** calling `factory.getProvider(null)`
+- **THEN** the factory SHALL return the default provider instance
+
+## MODIFIED Requirements
+
+### Requirement: Voice Creation Flow
+The voice creation process SHALL use the provider abstraction layer instead of directly calling CosyVoice client.
+
+#### Scenario: Create voice with CosyVoice
+- **GIVEN** a user uploads a voice audio file
+- **WHEN** creating a voice configuration through the API
+- **THEN** the system SHALL:
+  1. Validate the file exists and belongs to voice category
+  2. Call `provider.cloneVoice()` with the audio URL
+  3. Store the returned `voiceId` in the database
+  4. Return success response with voice configuration ID
+
+#### Scenario: Create voice with transcription
+- **GIVEN** a voice configuration is created without transcription
+- **WHEN** the user triggers transcription
+- **THEN** the system SHALL:
+  1. Fetch the audio file URL
+  2. Call the transcription service
+  3. Store the transcription text
+  4. Update the voice configuration
+
+### Requirement: Voice Preview
+The voice preview functionality SHALL work with both cloned voices (voiceId) and reference audio (file URL).
+
+#### Scenario: Preview cloned voice
+- **GIVEN** a voice configuration with a valid `voiceId`
+- **WHEN** requesting a preview with custom text
+- **THEN** the system SHALL call `provider.synthesize()` with the voiceId
+- **AND** return audio data in Base64 format
+
+#### Scenario: Preview with reference audio
+- **GIVEN** a voice configuration without `voiceId` but with audio file
+- **WHEN** requesting a preview
+- **THEN** the system SHALL call `provider.synthesize()` with the file URL
+- **AND** use the stored transcription as reference text
+- **AND** return audio data in Base64 format
+
+## REMOVED Requirements
+
+None. This change is additive and refactoring only.
--- a/openspec/changes/refactor-voice-provider/tasks.md
+++ b/openspec/changes/refactor-voice-provider/tasks.md
@@ -0,0 +1,53 @@
+# Implementation Tasks
+
+## 1. 接口与基础结构
+- [ ] 1.1 创建 `VoiceCloneProvider` 接口
+  - 定义 `cloneVoice(VoiceCloneRequest)` 方法
+  - 定义 `synthesize(VoiceTtsRequest)` 方法
+  - 定义 `supports(String providerType)` 方法
+- [ ] 1.2 创建统一 DTO 类
+  - `VoiceCloneRequest` - 语音克隆请求
+  - `VoiceCloneResult` - 语音克隆响应
+  - `VoiceTtsRequest` - 语音合成请求
+  - `VoiceTtsResult` - 语音合成响应
+- [ ] 1.3 创建 `VoiceCloneProviderFactory` 工厂类
+  - 根据配置获取 Provider 实例
+  - 支持动态切换供应商
+
+## 2. CosyVoice 重构（保持现有功能）
+- [ ] 2.1 重命名 `CosyVoiceClient` → `CosyVoiceProvider`
+- [ ] 2.2 `CosyVoiceProvider` 实现 `VoiceCloneProvider` 接口
+- [ ] 2.3 适配现有 DTO 到新的统一 DTO
+- [ ] 2.4 保持现有 DashScope SDK 调用逻辑不变
+
+## 3. 配置重构
+- [ ] 3.1 创建 `VoiceProviderProperties` 配置类
+  - 支持多供应商配置结构
+  - 添加 `default-provider` 配置项
+- [ ] 3.2 创建 `CosyVoiceProviderConfig` (嵌套配置)
+- [ ] 3.3 保持向后兼容：支持读取旧的 `yudao.cosyvoice.*` 配置
+
+## 4. Service 层改造
+- [ ] 4.1 修改 `TikUserVoiceServiceImpl`
+  - 注入 `VoiceCloneProvider` 而非 `CosyVoiceClient`
+  - 使用工厂获取 Provider 实例
+- [ ] 4.2 更新方法调用
+  - `createVoice()` - 使用 `provider.cloneVoice()`
+  - `synthesizeVoice()` - 使用 `provider.synthesize()`
+  - `previewVoice()` - 使用 `provider.synthesize()`
+
+## 5. 测试与验证
+- [ ] 5.1 单元测试：CosyVoiceProvider
+- [ ] 5.2 单元测试：VoiceCloneProviderFactory
+- [ ] 5.3 集成测试：TikUserVoiceServiceImpl
+- [ ] 5.4 验证现有功能正常运行
+
+## 6. 文档与配置迁移
+- [ ] 6.1 更新 `application.yaml` 配置示例
+- [ ] 6.2 添加配置迁移说明文档
+
+---
+
+**总计**: 20 项任务
+
+**预计工作量**: 2-3 天