Model Type | |
Use Cases |
Areas: | Research, Commercial applications |
|
Applications: | Coding, Mathematics assistance, Text generation |
|
Primary Use Cases: | Instruction following, Generating long texts, Understanding structured data, Generating structured outputs, Multilingual text generation |
|
|
Additional Notes | The model is GPTQ-quantized 8-bit instruction-tuned. |
|
Supported Languages | English (primary), others (Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more) |
|
Training Details |
Data Sources: | Qwen specialized expert models |
|
Methodology: | Pretraining & Post-training with expert models in domains like coding and mathematics |
|
Context Length: | |
Model Architecture: | transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings |
|
|
Input Output |
Accepted Modalities: | |
Performance Tips: | Use the latest version of transformers to avoid KeyError |
|
|