Model Type | Mixture-of-Experts (MoE), code language model |
|
Use Cases |
Areas: | code-specific tasks, math and reasoning, programming languages extension |
|
Applications: | AI code assistance, software development, research in code intelligence |
|
Primary Use Cases: | Code completion, Code insertion, Chatbot assistance for coding queries |
|
Limitations: | Optimal performance requires specified hardware, Compatibility with certain APIs necessary |
|
|
Additional Notes | Supported languages expanded to 338 from 86. Allows for commercial use. |
|
Supported Languages | languages_supported (, programming languages: 338, extended from 86), competence_level (high proficiency in code-specific tasks) |
|
Training Details |
Data Sources: | DeepSeekMoE framework, intermediate checkpoint of DeepSeek-V2, additional 6 trillion tokens |
|
Data Volume: | |
Methodology: | Mixture-of-experts mechanism for enhanced coding and reasoning |
|
Context Length: | |
Hardware Used: | BF16 format inference requires 8*80GB GPUs |
|
Model Architecture: | Mixture-of-Experts with active parameters |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | Model-generated text responses |
|
Performance Tips: | Use of specified HF or vLLM frameworks for optimal inference. |
|
|