feat: 功能优化
This commit is contained in:
@@ -0,0 +1,132 @@
|
||||
# Voice Clone Capability Specification
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Provider Abstraction Layer
|
||||
The system SHALL provide a unified provider abstraction layer for voice cloning services, supporting multiple vendors through a common interface.
|
||||
|
||||
#### Scenario: Get provider by type
|
||||
- **GIVEN** the system is configured with multiple voice clone providers
|
||||
- **WHEN** requesting a provider by type
|
||||
- **THEN** the system SHALL return the corresponding provider instance
|
||||
- **AND** the provider SHALL implement the `VoiceCloneProvider` interface
|
||||
|
||||
#### Scenario: Provider not found
|
||||
- **GIVEN** the system is configured with a default provider
|
||||
- **WHEN** requesting a non-existent provider type
|
||||
- **THEN** the system SHALL fallback to the default provider
|
||||
- **AND** log a warning message
|
||||
|
||||
### Requirement: Voice Cloning
|
||||
The system SHALL support voice cloning through the provider interface, accepting an audio file URL and returning a unique voice ID.
|
||||
|
||||
#### Scenario: Successful voice cloning with CosyVoice
|
||||
- **GIVEN** a valid CosyVoice provider is configured
|
||||
- **WHEN** submitting a voice clone request with audio URL
|
||||
- **THEN** the system SHALL return a voice ID
|
||||
- **AND** the voice ID SHALL be usable for subsequent TTS synthesis
|
||||
|
||||
#### Scenario: Voice cloning failure
|
||||
- **GIVEN** the provider API is unavailable or returns an error
|
||||
- **WHEN** submitting a voice clone request
|
||||
- **THEN** the system SHALL throw a `VOICE_TTS_FAILED` exception
|
||||
- **AND** log the error details for debugging
|
||||
|
||||
### Requirement: Text-to-Speech Synthesis
|
||||
The system SHALL support TTS synthesis through cloned voices or system voices, accepting text input and returning audio data.
|
||||
|
||||
#### Scenario: TTS with cloned voice
|
||||
- **GIVEN** a valid voice ID from a previous clone operation
|
||||
- **WHEN** submitting a TTS request with text and voice ID
|
||||
- **THEN** the system SHALL return audio data in the specified format
|
||||
- **AND** the audio SHALL match the cloned voice characteristics
|
||||
|
||||
#### Scenario: TTS with system voice
|
||||
- **GIVEN** a system voice ID is configured
|
||||
- **WHEN** submitting a TTS request with text and system voice ID
|
||||
- **THEN** the system SHALL return audio data using the system voice
|
||||
- **AND** the audio SHALL match the system voice characteristics
|
||||
|
||||
#### Scenario: TTS with reference audio (file URL)
|
||||
- **GIVEN** a reference audio URL and transcription text
|
||||
- **WHEN** submitting a TTS request with file URL
|
||||
- **THEN** the system SHALL perform on-the-fly voice cloning
|
||||
- **AND** return audio data matching the reference voice
|
||||
|
||||
### Requirement: Configuration Management
|
||||
The system SHALL support multi-provider configuration through a unified configuration structure.
|
||||
|
||||
#### Scenario: Configure multiple providers
|
||||
- **GIVEN** the application configuration file
|
||||
- **WHEN** configuring multiple voice providers
|
||||
- **THEN** each provider SHALL have independent `enabled` flag
|
||||
- **AND** the system SHALL only use enabled providers
|
||||
|
||||
#### Scenario: Default provider selection
|
||||
- **GIVEN** the configuration specifies a `default-provider`
|
||||
- **WHEN** no provider is explicitly specified
|
||||
- **THEN** the system SHALL use the default provider
|
||||
- **AND** fallback to `cosyvoice` if default is not configured
|
||||
|
||||
#### Scenario: Backward compatibility
|
||||
- **GIVEN** existing configuration using `yudao.cosyvoice.*`
|
||||
- **WHEN** the system starts
|
||||
- **THEN** the system SHALL automatically migrate to new config structure
|
||||
- **AND** existing functionality SHALL remain unchanged
|
||||
|
||||
### Requirement: Provider Factory
|
||||
The system SHALL provide a factory component for managing provider instances and resolving providers by type.
|
||||
|
||||
#### Scenario: Factory resolves provider
|
||||
- **GIVEN** the factory is initialized with provider configurations
|
||||
- **WHEN** calling `factory.getProvider("cosyvoice")`
|
||||
- **THEN** the factory SHALL return the CosyVoiceProvider instance
|
||||
- **AND** cache the instance for subsequent requests
|
||||
|
||||
#### Scenario: Factory returns default
|
||||
- **GIVEN** the factory is configured with default provider
|
||||
- **WHEN** calling `factory.getProvider(null)`
|
||||
- **THEN** the factory SHALL return the default provider instance
|
||||
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Voice Creation Flow
|
||||
The voice creation process SHALL use the provider abstraction layer instead of directly calling CosyVoice client.
|
||||
|
||||
#### Scenario: Create voice with CosyVoice
|
||||
- **GIVEN** a user uploads a voice audio file
|
||||
- **WHEN** creating a voice configuration through the API
|
||||
- **THEN** the system SHALL:
|
||||
1. Validate the file exists and belongs to voice category
|
||||
2. Call `provider.cloneVoice()` with the audio URL
|
||||
3. Store the returned `voiceId` in the database
|
||||
4. Return success response with voice configuration ID
|
||||
|
||||
#### Scenario: Create voice with transcription
|
||||
- **GIVEN** a voice configuration is created without transcription
|
||||
- **WHEN** the user triggers transcription
|
||||
- **THEN** the system SHALL:
|
||||
1. Fetch the audio file URL
|
||||
2. Call the transcription service
|
||||
3. Store the transcription text
|
||||
4. Update the voice configuration
|
||||
|
||||
### Requirement: Voice Preview
|
||||
The voice preview functionality SHALL work with both cloned voices (voiceId) and reference audio (file URL).
|
||||
|
||||
#### Scenario: Preview cloned voice
|
||||
- **GIVEN** a voice configuration with a valid `voiceId`
|
||||
- **WHEN** requesting a preview with custom text
|
||||
- **THEN** the system SHALL call `provider.synthesize()` with the voiceId
|
||||
- **AND** return audio data in Base64 format
|
||||
|
||||
#### Scenario: Preview with reference audio
|
||||
- **GIVEN** a voice configuration without `voiceId` but with audio file
|
||||
- **WHEN** requesting a preview
|
||||
- **THEN** the system SHALL call `provider.synthesize()` with the file URL
|
||||
- **AND** use the stored transcription as reference text
|
||||
- **AND** return audio data in Base64 format
|
||||
|
||||
## REMOVED Requirements
|
||||
|
||||
None. This change is additive and refactoring only.
|
||||
Reference in New Issue
Block a user