Model Type | |
Use Cases |
Areas: | chatbot, instruction-following |
|
Applications: | chat, instruction-based interactions |
|
Primary Use Cases: | ready-to-use chat/instruct model |
|
Limitations: | Model mostly trained on English data, may not generalize well to other languages |
|
Considerations: | Develop guardrails and take precautions for production use. |
|
|
Additional Notes | Instruct model, not ideal for further finetuning. Optimized architecture for inference featuring FlashAttention and multiquery. |
|
Supported Languages | English (primary), French (secondary) |
|
Training Details |
Data Sources: | Baize instruction dataset, RefinedWeb |
|
Data Volume: | 150M tokens from Baize mixed with 5% RefinedWeb |
|
Methodology: | Finetuned on a mixture of chat data with 5% RefinedWeb |
|
Context Length: | |
Hardware Used: | 64 A100 40GB GPUs on AWS SageMaker |
|
Model Architecture: | Causal decoder-only with adaptations from GPT-3, including rotary embeddings, multiquery attention, FlashAttention, and a single layer norm with parallel attention/MLP |
|
|
Input Output | |