Use Cases |
Areas: | Research, Non-commercial applications |
|
Applications: | Text generation, Exploration of language model characteristics |
|
Primary Use Cases: | Language generation, Information Extraction, Question Answering, Summarization |
|
Limitations: | High-stakes settings, Critical decisions, Generating factual content |
|
Considerations: | Using the model in high-stakes settings is out of scope. Include appropriate disclaimers and age blocks if necessary. |
|
|
Supported Languages | ak (Akan), ar (Arabic), as (Assamese), bm (Bambara), bn (Bengali), ca (Catalan), code (Programming Languages), en (English), es (Spanish), eu (Basque), fon (Fon), fr (French), gu (Gujarati), hi (Hindi), id (Indonesian), ig (Igbo), ki (Kikuyu), kn (Kannada), lg (Luganda), ln (Lingala), ml (Malayalam), mr (Marathi), ne (Nepali), nso (Northern Sotho), ny (Chichewa), or (Odia), pa (Punjabi), pt (Portuguese), rn (Kirundi), rw (Kinyarwanda), sn (Shona), st (Southern Sotho), sw (Swahili), ta (Tamil), te (Telugu), tn (Tswana), ts (Tsonga), tum (Tumbuka), tw (Twi), ur (Urdu), vi (Vietnamese), wo (Wolof), xh (Xhosa), yo (Yoruba), zh (Chinese), zhs (Simplified Chinese), zht (Traditional Chinese), zu (Zulu) |
|
Training Details |
Data Sources: | Various text sources, 45 natural languages, 12 programming languages |
|
Data Volume: | 1.5TB of pre-processed text, 350B unique tokens |
|
Methodology: | Cross Entropy with mean reduction |
|
Context Length: | |
Training Time: | 11th March, 2022 to 5th July, 2022 |
|
Hardware Used: | 384 A100 80GB GPUs, AMD CPU |
|
Model Architecture: | Decoder-only architecture, Modified from Megatron-LM GPT2 |
|
|