feat: 功能优化
This commit is contained in:
133
openspec/changes/refactor-voice-provider/design.md
Normal file
133
openspec/changes/refactor-voice-provider/design.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# Technical Design: Voice Clone Provider Refactoring
|
||||
|
||||
## Context
|
||||
|
||||
当前语音克隆功能直接依赖阿里云 CosyVoice 的 SDK 和 API。Service 层直接调用 `CosyVoiceClient`,导致:
|
||||
|
||||
1. **强耦合**:无法轻松切换或添加其他供应商
|
||||
2. **测试困难**:难以 mock 外部依赖
|
||||
3. **扩展性差**:添加新供应商需要修改 Service 层
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
### Goals
|
||||
- 解耦 Service 层与具体供应商实现
|
||||
- 支持多供应商并存和动态切换
|
||||
- 保持现有功能完全兼容
|
||||
- 为添加硅基流动 IndexTTS-2 打下基础
|
||||
|
||||
### Non-Goals
|
||||
- 不改变现有 API 行为
|
||||
- 不修改数据库结构
|
||||
- 不改变前端交互
|
||||
|
||||
## Decisions
|
||||
|
||||
### 1. 采用策略模式 + 工厂模式
|
||||
|
||||
**Why**:
|
||||
- 策略模式:定义统一接口,各供应商独立实现
|
||||
- 工厂模式:根据配置动态获取 Provider 实例
|
||||
- 符合开闭原则,扩展时无需修改现有代码
|
||||
|
||||
**架构**:
|
||||
```
|
||||
VoiceCloneProvider (interface)
|
||||
├── CosyVoiceProvider (impl) - 阿里云 CosyVoice (DashScope)
|
||||
├── SiliconFlowProvider (impl) - 阶段二:硅基流动 IndexTTS-2
|
||||
└── VoiceCloneProviderFactory
|
||||
```
|
||||
|
||||
**说明**:
|
||||
- `CosyVoiceProvider` 对应阿里云 DashScope 的语音服务
|
||||
- 默认模型:`cosyvoice-v3-flash`
|
||||
- 扩展时添加新的 Provider 实现
|
||||
|
||||
### 2. 统一 DTO 设计
|
||||
|
||||
**Why**: 屏蔽不同供应商的 API 差异
|
||||
|
||||
```java
|
||||
// 统一请求
|
||||
VoiceCloneRequest {
|
||||
String audioUrl; // 音频 URL
|
||||
String prefix; // 音色前缀
|
||||
String targetModel; // 目标模型
|
||||
}
|
||||
|
||||
// 统一响应
|
||||
VoiceCloneResult {
|
||||
String voiceId; // 生成的音色 ID
|
||||
String requestId; // 请求 ID
|
||||
}
|
||||
```
|
||||
|
||||
### 3. 配置结构设计
|
||||
|
||||
**新配置结构**:
|
||||
```yaml
|
||||
yudao:
|
||||
voice:
|
||||
# 默认供应商
|
||||
default-provider: cosyvoice
|
||||
|
||||
# 供应商配置
|
||||
providers:
|
||||
cosyvoice: # 阿里云 CosyVoice
|
||||
enabled: true
|
||||
api-key: ${DASHSCOPE_API_KEY}
|
||||
default-model: cosyvoice-v3-flash
|
||||
# ... 其他配置
|
||||
|
||||
siliconflow: # 阶段二添加
|
||||
enabled: false
|
||||
api-key: ${SILICONFLOW_API_KEY}
|
||||
base-url: https://api.siliconflow.cn
|
||||
default-model: indextts-2
|
||||
```
|
||||
|
||||
**向后兼容**:
|
||||
- 读取旧配置 `yudao.cosyvoice.*` 并合并到新结构
|
||||
- 优先使用新配置,旧配置作为 fallback
|
||||
|
||||
### 4. 错误处理策略
|
||||
|
||||
- Provider 调用失败时,记录详细日志
|
||||
- 返回统一的业务异常 `VOICE_TTS_FAILED`
|
||||
- 不暴露底层供应商的技术细节
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| 破坏现有功能 | 充分测试,保持 DTO 兼容 |
|
||||
| 配置迁移复杂 | 支持旧配置自动映射 |
|
||||
| 性能开销 | 工厂缓存 Provider 实例 |
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### 阶段一:CosyVoice 重构
|
||||
1. 创建接口和工厂
|
||||
2. 重构 CosyVoice 为 Provider 实现
|
||||
3. 更新 Service 层使用接口
|
||||
4. 测试验证
|
||||
|
||||
### 阶段二:添加 SiliconFlow
|
||||
1. 实现 SiliconFlowProvider
|
||||
2. 添加配置支持
|
||||
3. 集成测试
|
||||
|
||||
### 回滚方案
|
||||
- 保留原有配置支持
|
||||
- Feature Flag 控制新逻辑
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Q**: 是否需要支持运行时动态切换供应商?
|
||||
**A**: 初期不支持,通过配置切换即可
|
||||
|
||||
2. **Q**: 是否需要 Provider 健康检查?
|
||||
**A**: 阶段二考虑添加
|
||||
|
||||
3. **Q**: DTO 字段差异如何处理?
|
||||
**A**: 使用公共字段,扩展字段放 `Map<String, Object> extensions`
|
||||
35
openspec/changes/refactor-voice-provider/proposal.md
Normal file
35
openspec/changes/refactor-voice-provider/proposal.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Change: Refactor Voice Clone Provider
|
||||
|
||||
## Why
|
||||
|
||||
当前语音克隆功能直接依赖阿里云 CosyVoice 实现,代码强耦合,扩展性差。添加新供应商(如硅基流动 IndexTTS-2)需要修改 Service 层代码,违反开闭原则。
|
||||
|
||||
**说明**: CosyVoice 是阿里云的语音合成服务(DashScope 平台),支持语音克隆和 TTS。当前代码使用 `cosyvoice-v3-flash` 模型。
|
||||
|
||||
## What Changes
|
||||
|
||||
- **ADDED** 引入策略模式,定义 `VoiceCloneProvider` 统一接口
|
||||
- **ADDED** 创建工厂类 `VoiceCloneProviderFactory` 管理多供应商
|
||||
- **MODIFIED** 将现有 `CosyVoiceClient` 改造为 `CosyVoiceProvider`
|
||||
- **MODIFIED** 更新 `TikUserVoiceServiceImpl` 使用 Provider 接口
|
||||
- **ADDED** 新增配置类支持多供应商配置和切换
|
||||
- **BREAKING** 配置项从 `yudao.cosyvoice` 迁移到 `yudao.voice.providers`
|
||||
|
||||
## Impact
|
||||
|
||||
- **Affected specs**:
|
||||
- `voice-clone` (新增能力规范)
|
||||
- **Affected code**:
|
||||
- `TikUserVoiceServiceImpl.java` - Service 层改为依赖注入 Provider
|
||||
- `CosyVoiceClient.java` → `CosyVoiceProvider.java` - 重命名并实现接口
|
||||
- `CosyVoiceProperties.java` → `VoiceProviderProperties.java` - 配置结构重组
|
||||
- 新增 `VoiceCloneProvider.java` - 统一接口定义
|
||||
- 新增 `VoiceCloneProviderFactory.java` - 工厂类
|
||||
- 新增 `SiliconFlowProvider.java` - 硅基流动实现(阶段二)
|
||||
|
||||
## Migration
|
||||
|
||||
- 现有配置自动迁移:`yudao.cosyvoice.*` → `yudao.voice.providers.cosyvoice.*`
|
||||
- 默认供应商保持为 `cosyvoice`
|
||||
- 默认行为保持不变,向后兼容
|
||||
- 支持通过配置切换供应商:`yudao.voice.default-provider`
|
||||
@@ -0,0 +1,132 @@
|
||||
# Voice Clone Capability Specification
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Provider Abstraction Layer
|
||||
The system SHALL provide a unified provider abstraction layer for voice cloning services, supporting multiple vendors through a common interface.
|
||||
|
||||
#### Scenario: Get provider by type
|
||||
- **GIVEN** the system is configured with multiple voice clone providers
|
||||
- **WHEN** requesting a provider by type
|
||||
- **THEN** the system SHALL return the corresponding provider instance
|
||||
- **AND** the provider SHALL implement the `VoiceCloneProvider` interface
|
||||
|
||||
#### Scenario: Provider not found
|
||||
- **GIVEN** the system is configured with a default provider
|
||||
- **WHEN** requesting a non-existent provider type
|
||||
- **THEN** the system SHALL fallback to the default provider
|
||||
- **AND** log a warning message
|
||||
|
||||
### Requirement: Voice Cloning
|
||||
The system SHALL support voice cloning through the provider interface, accepting an audio file URL and returning a unique voice ID.
|
||||
|
||||
#### Scenario: Successful voice cloning with CosyVoice
|
||||
- **GIVEN** a valid CosyVoice provider is configured
|
||||
- **WHEN** submitting a voice clone request with audio URL
|
||||
- **THEN** the system SHALL return a voice ID
|
||||
- **AND** the voice ID SHALL be usable for subsequent TTS synthesis
|
||||
|
||||
#### Scenario: Voice cloning failure
|
||||
- **GIVEN** the provider API is unavailable or returns an error
|
||||
- **WHEN** submitting a voice clone request
|
||||
- **THEN** the system SHALL throw a `VOICE_TTS_FAILED` exception
|
||||
- **AND** log the error details for debugging
|
||||
|
||||
### Requirement: Text-to-Speech Synthesis
|
||||
The system SHALL support TTS synthesis through cloned voices or system voices, accepting text input and returning audio data.
|
||||
|
||||
#### Scenario: TTS with cloned voice
|
||||
- **GIVEN** a valid voice ID from a previous clone operation
|
||||
- **WHEN** submitting a TTS request with text and voice ID
|
||||
- **THEN** the system SHALL return audio data in the specified format
|
||||
- **AND** the audio SHALL match the cloned voice characteristics
|
||||
|
||||
#### Scenario: TTS with system voice
|
||||
- **GIVEN** a system voice ID is configured
|
||||
- **WHEN** submitting a TTS request with text and system voice ID
|
||||
- **THEN** the system SHALL return audio data using the system voice
|
||||
- **AND** the audio SHALL match the system voice characteristics
|
||||
|
||||
#### Scenario: TTS with reference audio (file URL)
|
||||
- **GIVEN** a reference audio URL and transcription text
|
||||
- **WHEN** submitting a TTS request with file URL
|
||||
- **THEN** the system SHALL perform on-the-fly voice cloning
|
||||
- **AND** return audio data matching the reference voice
|
||||
|
||||
### Requirement: Configuration Management
|
||||
The system SHALL support multi-provider configuration through a unified configuration structure.
|
||||
|
||||
#### Scenario: Configure multiple providers
|
||||
- **GIVEN** the application configuration file
|
||||
- **WHEN** configuring multiple voice providers
|
||||
- **THEN** each provider SHALL have independent `enabled` flag
|
||||
- **AND** the system SHALL only use enabled providers
|
||||
|
||||
#### Scenario: Default provider selection
|
||||
- **GIVEN** the configuration specifies a `default-provider`
|
||||
- **WHEN** no provider is explicitly specified
|
||||
- **THEN** the system SHALL use the default provider
|
||||
- **AND** fallback to `cosyvoice` if default is not configured
|
||||
|
||||
#### Scenario: Backward compatibility
|
||||
- **GIVEN** existing configuration using `yudao.cosyvoice.*`
|
||||
- **WHEN** the system starts
|
||||
- **THEN** the system SHALL automatically migrate to new config structure
|
||||
- **AND** existing functionality SHALL remain unchanged
|
||||
|
||||
### Requirement: Provider Factory
|
||||
The system SHALL provide a factory component for managing provider instances and resolving providers by type.
|
||||
|
||||
#### Scenario: Factory resolves provider
|
||||
- **GIVEN** the factory is initialized with provider configurations
|
||||
- **WHEN** calling `factory.getProvider("cosyvoice")`
|
||||
- **THEN** the factory SHALL return the CosyVoiceProvider instance
|
||||
- **AND** cache the instance for subsequent requests
|
||||
|
||||
#### Scenario: Factory returns default
|
||||
- **GIVEN** the factory is configured with default provider
|
||||
- **WHEN** calling `factory.getProvider(null)`
|
||||
- **THEN** the factory SHALL return the default provider instance
|
||||
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Voice Creation Flow
|
||||
The voice creation process SHALL use the provider abstraction layer instead of directly calling CosyVoice client.
|
||||
|
||||
#### Scenario: Create voice with CosyVoice
|
||||
- **GIVEN** a user uploads a voice audio file
|
||||
- **WHEN** creating a voice configuration through the API
|
||||
- **THEN** the system SHALL:
|
||||
1. Validate the file exists and belongs to voice category
|
||||
2. Call `provider.cloneVoice()` with the audio URL
|
||||
3. Store the returned `voiceId` in the database
|
||||
4. Return success response with voice configuration ID
|
||||
|
||||
#### Scenario: Create voice with transcription
|
||||
- **GIVEN** a voice configuration is created without transcription
|
||||
- **WHEN** the user triggers transcription
|
||||
- **THEN** the system SHALL:
|
||||
1. Fetch the audio file URL
|
||||
2. Call the transcription service
|
||||
3. Store the transcription text
|
||||
4. Update the voice configuration
|
||||
|
||||
### Requirement: Voice Preview
|
||||
The voice preview functionality SHALL work with both cloned voices (voiceId) and reference audio (file URL).
|
||||
|
||||
#### Scenario: Preview cloned voice
|
||||
- **GIVEN** a voice configuration with a valid `voiceId`
|
||||
- **WHEN** requesting a preview with custom text
|
||||
- **THEN** the system SHALL call `provider.synthesize()` with the voiceId
|
||||
- **AND** return audio data in Base64 format
|
||||
|
||||
#### Scenario: Preview with reference audio
|
||||
- **GIVEN** a voice configuration without `voiceId` but with audio file
|
||||
- **WHEN** requesting a preview
|
||||
- **THEN** the system SHALL call `provider.synthesize()` with the file URL
|
||||
- **AND** use the stored transcription as reference text
|
||||
- **AND** return audio data in Base64 format
|
||||
|
||||
## REMOVED Requirements
|
||||
|
||||
None. This change is additive and refactoring only.
|
||||
53
openspec/changes/refactor-voice-provider/tasks.md
Normal file
53
openspec/changes/refactor-voice-provider/tasks.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Implementation Tasks
|
||||
|
||||
## 1. 接口与基础结构
|
||||
- [ ] 1.1 创建 `VoiceCloneProvider` 接口
|
||||
- 定义 `cloneVoice(VoiceCloneRequest)` 方法
|
||||
- 定义 `synthesize(VoiceTtsRequest)` 方法
|
||||
- 定义 `supports(String providerType)` 方法
|
||||
- [ ] 1.2 创建统一 DTO 类
|
||||
- `VoiceCloneRequest` - 语音克隆请求
|
||||
- `VoiceCloneResult` - 语音克隆响应
|
||||
- `VoiceTtsRequest` - 语音合成请求
|
||||
- `VoiceTtsResult` - 语音合成响应
|
||||
- [ ] 1.3 创建 `VoiceCloneProviderFactory` 工厂类
|
||||
- 根据配置获取 Provider 实例
|
||||
- 支持动态切换供应商
|
||||
|
||||
## 2. CosyVoice 重构(保持现有功能)
|
||||
- [ ] 2.1 重命名 `CosyVoiceClient` → `CosyVoiceProvider`
|
||||
- [ ] 2.2 `CosyVoiceProvider` 实现 `VoiceCloneProvider` 接口
|
||||
- [ ] 2.3 适配现有 DTO 到新的统一 DTO
|
||||
- [ ] 2.4 保持现有 DashScope SDK 调用逻辑不变
|
||||
|
||||
## 3. 配置重构
|
||||
- [ ] 3.1 创建 `VoiceProviderProperties` 配置类
|
||||
- 支持多供应商配置结构
|
||||
- 添加 `default-provider` 配置项
|
||||
- [ ] 3.2 创建 `CosyVoiceProviderConfig` (嵌套配置)
|
||||
- [ ] 3.3 保持向后兼容:支持读取旧的 `yudao.cosyvoice.*` 配置
|
||||
|
||||
## 4. Service 层改造
|
||||
- [ ] 4.1 修改 `TikUserVoiceServiceImpl`
|
||||
- 注入 `VoiceCloneProvider` 而非 `CosyVoiceClient`
|
||||
- 使用工厂获取 Provider 实例
|
||||
- [ ] 4.2 更新方法调用
|
||||
- `createVoice()` - 使用 `provider.cloneVoice()`
|
||||
- `synthesizeVoice()` - 使用 `provider.synthesize()`
|
||||
- `previewVoice()` - 使用 `provider.synthesize()`
|
||||
|
||||
## 5. 测试与验证
|
||||
- [ ] 5.1 单元测试:CosyVoiceProvider
|
||||
- [ ] 5.2 单元测试:VoiceCloneProviderFactory
|
||||
- [ ] 5.3 集成测试:TikUserVoiceServiceImpl
|
||||
- [ ] 5.4 验证现有功能正常运行
|
||||
|
||||
## 6. 文档与配置迁移
|
||||
- [ ] 6.1 更新 `application.yaml` 配置示例
|
||||
- [ ] 6.2 添加配置迁移说明文档
|
||||
|
||||
---
|
||||
|
||||
**总计**: 20 项任务
|
||||
|
||||
**预计工作量**: 2-3 天
|
||||
Reference in New Issue
Block a user