Model Type | |
Use Cases |
Areas: | Research, Commercial applications |
|
Applications: | Natural language understanding and generation, Mechanistic interpretability, Sentiment analysis, Summarization |
|
Primary Use Cases: | Research purposes for Arabic NLP, Commercial chat applications, Sentiment analysis, Academic research |
|
Limitations: | Prohibited from generating harmful content, Sensitive information handling, Generalization across non-supported languages, High-stakes decision making |
|
Considerations: | Efforts to ensure cultural adaptation and diverse topic range in fine-tuning datasets. |
|
|
Additional Notes | Techniques used for Arabic model augmentation applicable to other low-resource languages. |
|
Supported Languages | Arabic (MSA) (Strong capabilities), English (Strong capabilities) |
|
Training Details |
Data Sources: | Web pages, Wikipedia articles, News articles, Social network content, Code data, Books, Scientific papers, Synthetic data (English to Arabic translations) |
|
Data Volume: | Up to 1.6 Trillion tokens |
|
Methodology: | Documents packed with EOS tokens for pre-training and frozen backbone during adapted pre-training. Instructional fine-tuning for chat models. |
|
Context Length: | |
Hardware Used: | Condor Galaxy supercomputer, 64 Cerebras CS-2 Wafer-Scale Engines |
|
Model Architecture: | Auto-regressive Transformer-based, decoder-only architecture with support for long context lengths. |
|
|
Responsible Ai Considerations |
Mitigation Strategies: | Minimized biases; AI assistant role limited to Arabic and English for fine-tuned models. |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
|