Supported Languages | en (English), zh (Chinese), id (Indonesian), ms (Malay), tl (Filipino), my (Burmese), vi (Vietnamese), th (Thai), lo (Lao), km (Khmer), ta (Tamil) |
|
Training Details |
Data Sources: | RefinedWeb - English, mC4 - Chinese, mC4 - Indonesian, mC4 - Malay, mC4 - Filipino, mC4 - Burmese, mC4 - Vietnamese, mC4 - Thai, WangChanBERTa - Thai, mC4 - Lao, mC4 - Khmer, mC4 - Tamil, the Stack - Python, the Stack - Javascript, the Stack - Shell, the Stack - SQL, the Stack - Markdown, RedPajama - StackExchange, RedPajama - ArXiv |
|
Data Volume: | |
Methodology: | Pretrained and instruct-tuned for SEA region |
|
Context Length: | |
Training Time: | |
Hardware Used: | AWS EC2 p4d.24xlarge, Nvidia A100 40GB GPU |
|
Model Architecture: | |
|