Model Type | |
Use Cases |
Areas: | Research, Commercial applications |
|
Applications: | Natural language processing, Coding, Mathematics, Chatbots |
|
Primary Use Cases: | Generating long texts, Understanding structured data, Multilingual text processing |
|
|
Supported Languages | languages_supported (29 languages including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more), proficiency (Multilingual support) |
|
Training Details |
Data Sources: | various sources mentioned in the technical report |
|
Methodology: | Pretraining & Post-training |
|
Context Length: | |
Model Architecture: | transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
|
|
Input Output |
Input Format: | Chat-based structured prompts |
|
Accepted Modalities: | |
Output Format: | Generated text following the prompt schema with a max of 8192 tokens |
|
Performance Tips: | Use vLLM for processing long texts; ensure proper configuration of rope scaling for long contexts |
|
|