Model Type | Large Language Model, Text Generation |
|
Use Cases |
Areas: | Research, Commercial applications |
|
Applications: | General English-language tasks, Coding tasks |
|
Primary Use Cases: | Few-turn question answering |
|
Limitations: | Not tested for non-English proficiency, No multimodal capabilities |
|
Considerations: | Evaluation around safety for specific applications recommended |
|
|
Supported Languages | English (General Text Processing) |
|
Training Details |
Data Sources: | 12T tokens of text and code |
|
Data Volume: | |
Methodology: | Curriculum learning, changed data mix during training |
|
Context Length: | |
Hardware Used: | Databricks infrastructure |
|
Model Architecture: | Fine-grained MoE with 16 experts |
|
|
Input Output |
Input Format: | Text-based, up to 32768 tokens |
|
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Using FlashAttention2 is recommended for faster inference. |
|
|
Release Notes |
Version: | |
Notes: | Instruction finetuned, Mixture-of-experts model |
|
|
|