Supported Languages | ak (Akan), ar (Arabic), as (Assamese), bm (Bambara), bn (Bengali), ca (Catalan), code (Programming Languages), en (English), es (Spanish), eu (Basque), fon (Fon), fr (French), gu (Gujarati), hi (Hindi), id (Indonesian), ig (Igbo), ki (Kikuyu), kn (Kannada), lg (Luganda), ln (Lingala), ml (Malayalam), mr (Marathi), ne (Nepali), nso (Northern Sotho), ny (Chichewa), or (Odia), pa (Punjabi), pt (Portuguese), rn (Kirundi), rw (Kinyarwanda), sn (Shona), st (Sesotho), sw (Swahili), ta (Tamil), te (Telugu), tn (Tswana), ts (Tsonga), tum (Tumbuka), tw (Twi), ur (Urdu), vi (Vietnamese), wo (Wolof), xh (Xhosa), yo (Yoruba), zh (Chinese), zu (Zulu) |
|
Training Details |
Data Sources: | 46 natural languages, 13 programming languages |
|
Data Volume: | 1.6TB of text, 350B tokens |
|
Methodology: | |
Context Length: | |
Training Time: | |
Hardware Used: | 384 A100 80GB GPUs, Jean Zay Public Supercomputer |
|
Model Architecture: | Megatron-LM GPT2 modified, 70 layers, 112 attention heads, 14336-dimensional hidden layers |
|
|