Model Type | dense, decoder-only, Transformer |
|
Use Cases |
Areas: | |
Applications: | Memory/compute constrained environments, Latency bound scenarios, Strong reasoning (code, math, and logic) |
|
Primary Use Cases: | commercial applications, research use in English |
|
Limitations: | Not evaluated for all downstream purposes, Performance primarily in English |
|
Considerations: | Developers should evaluate for accuracy, safety, and fairness, especially in high-risk scenarios. |
|
|
Additional Notes | Integration with transformers development version 4.40.0, using flash attention by default. Optimized for various iterations of GPU and CPU hardware. |
|
Supported Languages | |
Training Details |
Data Sources: | Publicly available documents, synthetic data, high-quality educational data, code |
|
Data Volume: | |
Methodology: | Supervised fine-tuning and Direct Preference Optimization |
|
Context Length: | |
Training Time: | |
Hardware Used: | |
Model Architecture: | dense decoder-only Transformer |
|
|
Safety Evaluation |
Risk Categories: | misinformation, bias, offensiveness |
|
Ethical Considerations: | Developers should ensure the model complies with relevant laws and regulations. |
|
|
Responsible Ai Considerations |
Fairness: | Evaluated for instructional following and safety measures. |
|
Transparency: | Developers should inform users they are interacting with an AI system. |
|
Accountability: | Developers are responsible for their specific use cases complying with laws. |
|
Mitigation Strategies: | Use available safety classifiers or custom solutions. |
|
|
Input Output |
Input Format: | Chat format with <|user|> and <|assistant|> tags |
|
Accepted Modalities: | |
Output Format: | Generated text in response to input prompts |
|
Performance Tips: | For NVIDIA V100 or earlier, use attn_implementation="eager" |
|
|