Model Type | dense-MoE Hybrid transformer |
|
Use Cases |
Areas: | research, prototypes, products |
|
Applications: | |
|
Additional Notes | Models leverage features from DeepSpeed for performance enhancement. |
|
Supported Languages | english (high proficiency) |
|
Training Details |
Model Architecture: | Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Use FP8 quantization for inference. |
|
|
Release Notes |
Date: | |
Notes: | Initial release of Arctic model. |
|
|
|