Model Type | Transformer style autoregressive language model |
|
Use Cases |
Areas: | Research, Commercial applications |
|
Applications: | Natural language processing tasks |
|
Primary Use Cases: | Language modeling, Text generation |
|
Limitations: | May produce harmful or biased content |
|
Considerations: | User awareness of potential risks and limitations. |
|
|
Additional Notes | Model checkpoints and code are open-source. |
|
Supported Languages | |
Training Details |
Data Sources: | |
Data Volume: | 2.5 Trillion Training Tokens |
|
Methodology: | Sequential Block Transformer with SwiGLU activation and RoPE positional embeddings |
|
Context Length: | |
Hardware Used: | MI250X GPUs at LUMI supercomputer, A100-40GB GPUs provided by MosaicML |
|
Model Architecture: | Transformer model with 32 layers, 4096 hidden size, 32 attention heads, sequential block type |
|
|
Safety Evaluation |
Risk Categories: | Harmful content, Sensitive content, Bias |
|
Ethical Considerations: | Model can generate harmful and biased content. |
|
|
Responsible Ai Considerations |
Fairness: | Potential bias in language outputs. |
|
Transparency: | Open-source model with accessible training data details. |
|
Accountability: | Developed by Allen Institute for AI. |
|
Mitigation Strategies: | Awareness of risks and user guidelines. |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Use optimized settings for text generation speed and accuracy. |
|
|
Release Notes |
Version: | |
Notes: | Initial release for open language modeling. |
|
|
|