Add visual indicators for prompt categories and source types in the prompt selector component, and refactor benchmark task execution to use Dify streaming analysis with proper error handling and text extraction from Alibaba Cloud transcription results.