Model Type | |
Use Cases |
Areas: | |
Applications: | |
Primary Use Cases: | Instruction tuned models for assistant-like tasks |
|
Limitations: | Use outside English laid out by Acceptable Use Policy |
|
Considerations: | Developers can fine-tune models for non-English languages adhering to license policy |
|
|
Supported Languages | |
Training Details |
Data Sources: | |
Data Volume: | 15 trillion tokens of pretraining data |
|
Methodology: | Progressive training on increasing context lengths, NTK-aware interpolation to initialize RoPE theta |
|
Context Length: | |
Training Time: | |
Hardware Used: | Crusoe Energy high performance L40S cluster (GPU), Meta's Research SuperCluster (H100-80GB GPUs) |
|
Model Architecture: | Optimized transformer architecture using NTK-aware interpolation and RoPE theta optimization |
|
|
Safety Evaluation |
Methodologies: | Red teaming, Adversarial tests |
|
Findings: | Residual risks are minimized, focus on limiting false refusals and maintaining model helpfulness |
|
Risk Categories: | Cybersecurity risks, Child safety risks, CBRNE hazards |
|
Ethical Considerations: | Transparency, rapid feedback loops, community collaboration for safety |
|
|
Responsible Ai Considerations |
Fairness: | Model designed to be helpful and unbiased across different use cases |
|
Transparency: | Open approach with community feedback to ensure improvements in safety and efficiency |
|
Accountability: | Meta ensures accountability through detailed Responsible Use Guide and community interactions |
|
Mitigation Strategies: | Deployment of Meta Llama Guard 2 and Code Shield safeguards |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Optimize inputs for long context handling utilizing model's capability |
|
|
Release Notes |
Version: | |
Date: | |
Notes: | Extended context, improved training efficiency with long contexts |
|
|
|