Model Type | multimodal, audio-language |
|
Use Cases |
Areas: | audio understanding, multi-turn dialogue |
|
Applications: | speech editing, music appreciation, sound understanding |
|
Primary Use Cases: | universal audio understanding, multi-turn dialogues with audio and text |
|
|
Additional Notes | The model supports diverse audio-oriented scenarios and tool usage. |
|
Supported Languages | zh (Mandarin), en (English) |
|
Training Details |
Methodology: | Instruction fine-tuning utilizing multi-task learning framework |
|
Model Architecture: | Large Audio Language Model |
|
|
Input Output |
Input Format: | Accepts diverse audio and text inputs |
|
Accepted Modalities: | |
Output Format: | |
|