Model Type | |
Additional Notes | The model is susceptible to jailbreak attacks and may generate inaccurate or biased content. Strong output validation controls are recommended. |
|
Training Details |
Data Sources: | open source instruction datasets, internally collected synthetic datasets |
|
Methodology: | supervised fine-tuning and direct preference optimization |
|
Training Time: | between September 4, 2024, and November 10th, 2024. |
|
Model Architecture: | Hybrid-head Architecture with standard attention heads and Mamba heads, Grouped-Query Attention (GQA), Rotary Position Embeddings (RoPE) |
|
|
Responsible Ai Considerations |
Mitigation Strategies: | Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and address unforeseen product misuse |
|
|
Input Output |
Accepted Modalities: | |
Performance Tips: | During generation, the batch size needs to be 1 as the current implementation does not fully support padding of Meta tokens + SWA |
|
|