Model Type | |
Use Cases |
Areas: | |
Primary Use Cases: | Memory/compute constrained environments, Latency bound scenarios, Strong reasoning (code, math, logic) |
|
Limitations: | Languages other than English will experience worse performance, Reinforcement of stereotypes |
|
Considerations: | Common limitations of language models should be considered. |
|
|
Additional Notes | This is a static model trained on an offline dataset with cutoff date October 2023. Future versions may be released. |
|
Supported Languages | |
Training Details |
Data Sources: | synthetic data, filtered publicly available websites |
|
Data Volume: | |
Methodology: | Fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) |
|
Context Length: | |
Training Time: | |
Hardware Used: | |
Model Architecture: | Dense decoder-only Transformer model with alternating dense and blocksparse attentions |
|
|
Responsible Ai Considerations |
Fairness: | Potential bias due to the training data's representation. |
|
Mitigation Strategies: | Post-training for safety measures. |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
|
Release Notes |
Version: | |
Notes: | Model weight is released. |
|
|
|