Model Type | |
Use Cases |
Areas: | Research, Commercial applications |
|
Applications: | Language understanding, Coding, Math reasoning |
|
Primary Use Cases: | Mathematical computation, Code generation |
|
Limitations: | Easily misled by input instructions |
|
Considerations: | Comply with the license agreement. |
|
|
Additional Notes | Operates with 3.7B active parameters out of total 40B, using 7.4 GFLOPS per token. |
|
Supported Languages | English (Proficient), Chinese (Proficient) |
|
Training Details |
Data Volume: | |
Methodology: | Attention Router for expert selection |
|
Context Length: | |
Model Architecture: | Mixture-of-Experts with Attention Router |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
|