Model Type | decoder-style transformer, LLM, multimodal |
|
Use Cases |
Areas: | research, commercial applications |
|
Applications: | text generation, long-form instruction following, dialogue generation |
|
Primary Use Cases: | finetuning for specific applications |
|
Limitations: | not intended for deployment without finetuning, can produce factually incorrect output |
|
Considerations: | Efforts made to clean pretraining data; however, outputs may still be offensive or biased. |
|
|
Additional Notes | This model builds on the MPT-7B with longer sequence handling and significant efficiency improvements. |
|
Supported Languages | |
Training Details |
Data Sources: | mc4, c4, togethercomputer/RedPajama-Data-1T, bigcode/the-stack, allenai/s2orc |
|
Data Volume: | |
Methodology: | MPT-7B-8k uses a modified transformer architecture, optimized for efficient training and inference with ALiBi for handling long inputs. |
|
Context Length: | |
Training Time: | |
Hardware Used: | |
Model Architecture: | Decoder-only transformer with modifications such as FlashAttention, ALiBi, elimination of positional embeddings. |
|
|
Safety Evaluation |
Ethical Considerations: | MPT-7B-8k can produce factually incorrect, lewd, biased or offensive outputs. It should not be used for human-facing interactions without further guardrails and user consent. |
|
|
Responsible Ai Considerations |
Fairness: | Model may have biases inherited from training data. |
|
Transparency: | Pretraining data was openly available, preprocessed to remove unsuitable content. |
|
Accountability: | Responsibility of MosaicML. |
|
Mitigation Strategies: | Guardrails recommended before deployment. |
|
|
Input Output |
Input Format: | Text sequences, up to 8k tokens |
|
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Utilize optimized implementations like FlashAttention and ensure usage with bfloat16 precision on GPUs. |
|
|
Release Notes |
Version: | |
Date: | |
Notes: | Initial release of MPT-7B-8k. |
|
|
|