sion/sionrui

Fork 0

Files

sion123 24f66c8e81 feat: 功能优化

2026-01-27 01:39:08 +08:00

5.6 KiB

Raw Blame History

Voice Clone Capability Specification

ADDED Requirements

Requirement: Provider Abstraction Layer

The system SHALL provide a unified provider abstraction layer for voice cloning services, supporting multiple vendors through a common interface.

Scenario: Get provider by type

GIVEN the system is configured with multiple voice clone providers
WHEN requesting a provider by type
THEN the system SHALL return the corresponding provider instance
AND the provider SHALL implement the VoiceCloneProvider interface

Scenario: Provider not found

GIVEN the system is configured with a default provider
WHEN requesting a non-existent provider type
THEN the system SHALL fallback to the default provider
AND log a warning message

Requirement: Voice Cloning

The system SHALL support voice cloning through the provider interface, accepting an audio file URL and returning a unique voice ID.

Scenario: Successful voice cloning with CosyVoice

GIVEN a valid CosyVoice provider is configured
WHEN submitting a voice clone request with audio URL
THEN the system SHALL return a voice ID
AND the voice ID SHALL be usable for subsequent TTS synthesis

Scenario: Voice cloning failure

GIVEN the provider API is unavailable or returns an error
WHEN submitting a voice clone request
THEN the system SHALL throw a VOICE_TTS_FAILED exception
AND log the error details for debugging

Requirement: Text-to-Speech Synthesis

The system SHALL support TTS synthesis through cloned voices or system voices, accepting text input and returning audio data.

Scenario: TTS with cloned voice

GIVEN a valid voice ID from a previous clone operation
WHEN submitting a TTS request with text and voice ID
THEN the system SHALL return audio data in the specified format
AND the audio SHALL match the cloned voice characteristics

Scenario: TTS with system voice

GIVEN a system voice ID is configured
WHEN submitting a TTS request with text and system voice ID
THEN the system SHALL return audio data using the system voice
AND the audio SHALL match the system voice characteristics

Scenario: TTS with reference audio (file URL)

GIVEN a reference audio URL and transcription text
WHEN submitting a TTS request with file URL
THEN the system SHALL perform on-the-fly voice cloning
AND return audio data matching the reference voice

Requirement: Configuration Management

The system SHALL support multi-provider configuration through a unified configuration structure.

Scenario: Configure multiple providers

GIVEN the application configuration file
WHEN configuring multiple voice providers
THEN each provider SHALL have independent enabled flag
AND the system SHALL only use enabled providers

Scenario: Default provider selection

GIVEN the configuration specifies a default-provider
WHEN no provider is explicitly specified
THEN the system SHALL use the default provider
AND fallback to cosyvoice if default is not configured

Scenario: Backward compatibility

GIVEN existing configuration using yudao.cosyvoice.*
WHEN the system starts
THEN the system SHALL automatically migrate to new config structure
AND existing functionality SHALL remain unchanged

Requirement: Provider Factory

The system SHALL provide a factory component for managing provider instances and resolving providers by type.

Scenario: Factory resolves provider

GIVEN the factory is initialized with provider configurations
WHEN calling factory.getProvider("cosyvoice")
THEN the factory SHALL return the CosyVoiceProvider instance
AND cache the instance for subsequent requests

Scenario: Factory returns default

GIVEN the factory is configured with default provider
WHEN calling factory.getProvider(null)
THEN the factory SHALL return the default provider instance

MODIFIED Requirements

Requirement: Voice Creation Flow

The voice creation process SHALL use the provider abstraction layer instead of directly calling CosyVoice client.

Scenario: Create voice with CosyVoice

GIVEN a user uploads a voice audio file
WHEN creating a voice configuration through the API
THEN the system SHALL:
1. Validate the file exists and belongs to voice category
2. Call provider.cloneVoice() with the audio URL
3. Store the returned voiceId in the database
4. Return success response with voice configuration ID

Scenario: Create voice with transcription

GIVEN a voice configuration is created without transcription
WHEN the user triggers transcription
THEN the system SHALL:
1. Fetch the audio file URL
2. Call the transcription service
3. Store the transcription text
4. Update the voice configuration

Requirement: Voice Preview

The voice preview functionality SHALL work with both cloned voices (voiceId) and reference audio (file URL).

Scenario: Preview cloned voice

GIVEN a voice configuration with a valid voiceId
WHEN requesting a preview with custom text
THEN the system SHALL call provider.synthesize() with the voiceId
AND return audio data in Base64 format

Scenario: Preview with reference audio

GIVEN a voice configuration without voiceId but with audio file
WHEN requesting a preview
THEN the system SHALL call provider.synthesize() with the file URL
AND use the stored transcription as reference text
AND return audio data in Base64 format

REMOVED Requirements

None. This change is additive and refactoring only.

5.6 KiB Raw Blame History