Model Type | audio-language, multimodal |
|
Use Cases |
Areas: | research, multimodal applications |
|
Limitations: | The model may not accurately follow human instructions., Prone to generating hallucinations., Lacks moderation mechanisms, potentially producing harmful or inappropriate responses. |
|
Considerations: | Developers should assess risks based on specific applications. |
|
|
Supported Languages | languages_supported (Thai, English), proficiency_level (native) |
|
Training Details |
Methodology: | Incorporates Whisper's encoder and BEATs |
|
Model Architecture: | Based on Typhoon-1.5-8b-instruct architecture |
|
|
Input Output |
Accepted Modalities: | |
Output Format: | |
|