Model Type | multilingual, text generation |
|
Use Cases |
Areas: | research, commercial applications |
|
Considerations: | Ensure conformity to safety measures and legal requirements. |
|
|
Additional Notes | Model supports multi-round dialog capabilities with optimizations for mask loss training. |
|
Supported Languages | Chinese (high), English (high) |
|
Training Details |
Data Sources: | Chinese and English high-quality data |
|
Data Volume: | |
Methodology: | |
Hardware Used: | 4x 40G A100 GPUs, DeepSpeed |
|
Model Architecture: | Decoder-only, using Rotary Embedding and SwiGLU activation |
|
|
Input Output | |
Release Notes |
Version: | |
Date: | |
Notes: | Release of the 52B version chat model. |
|
|
|