Model Type | chatbot, dialogue generation |
|
Use Cases |
Areas: | |
Limitations: | can produce factually incorrect output, may generate lewd, biased or offensive outputs |
|
|
Additional Notes | This model requires that trust_remote_code=True be passed to the from_pretrained method due to a custom MPT model architecture. |
|
Supported Languages | |
Training Details |
Data Sources: | jeffwan/sharegpt_vicuna, Hello-SimpleAI/HC3, tatsu-lab/alpaca, Anthropic/hh-rlhf, victor123/evol_instruct_70k |
|
Context Length: | |
Hardware Used: | 8 A100-80GBs, 32 A100-40GBs |
|
Model Architecture: | Modified decoder-only transformer |
|
|
Input Output |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | To use the optimized triton implementation of FlashAttention, load the model on GPU with attn_impl='triton' and bfloat16 precision. |
|
|