Model Type | text generation, bilingual LLM, causal-lm |
|
Use Cases |
Areas: | Research, Commercial applications |
|
Applications: | Natural language understanding, Generation tasks, Chat assistants, Sentiment analysis, Summarization |
|
Primary Use Cases: | General AI assistant for Arabic and English, Cultural aligned language processing |
|
Limitations: | Not suitable for multipurpose languages, Requires responsible use avoiding prohibited applications., Has limitations in handling non-target languages |
|
Considerations: | Model should be used with understanding of its capability in Arabic and English. Not suitable for high-stakes decisions without human oversight. |
|
|
Additional Notes | Extensive release focuses on Arabic NLP, model adaptation, and providing bilingual capabilities. |
|
Supported Languages | languages_supported (Arabic, English), proficiency_level (High) |
|
Training Details |
Data Sources: | Web pages, Wikipedia articles, News articles, Social network content, Code data, Books, Scientific papers, Synthetic translation |
|
Data Volume: | |
Methodology: | Pre-training from scratch and adaptive pre-training from Llama-2 |
|
Context Length: | |
Hardware Used: | Condor Galaxy Supercomputer, 64 Cerebras CS-2 Wafer-Scale Engines |
|
Model Architecture: | Auto-regressive transformer-based, decoder-only (GPT-3) with SwiGLU and ALiBi or RoPE and Grouped Query Attention |
|
|
Safety Evaluation |
Risk Categories: | |
Ethical Considerations: | Efforts to reduce bias; model may still exhibit biases and generate incorrect or misleading content. |
|
|
Responsible Ai Considerations |
Fairness: | Techniques employed to reduce bias |
|
Accountability: | User is responsible for model usage and outcomes |
|
Mitigation Strategies: | Training data curated by Inception and includes techniques to minimize bias. |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
|