Model Type | |
Use Cases |
Areas: | Research, Commercial Applications |
|
Applications: | Text generation, Prompting for downstream tasks |
|
Primary Use Cases: | Text generation, Prompting for evaluation of downstream tasks |
|
Limitations: | Bias in training data, Quality issues in generation diversity and hallucination. |
|
Considerations: | Bias in training data can affect fine-tuned versions. |
|
|
Additional Notes | The model card discusses ethical considerations related to model biases due to the nature of the training data. |
|
Training Details |
Data Sources: | BookCorpus, CC-Stories, The Pile, Pushshift.io Reddit, CCNewsV2 |
|
Data Volume: | |
Methodology: | Causal language modeling (CLM) using GPT2 byte-level BPE. |
|
Context Length: | |
Training Time: | 33 days of continuous training |
|
Hardware Used: | |
Model Architecture: | Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers. |
|
|
Safety Evaluation |
Methodologies: | Evaluation using prompts similar to GPT-3 |
|
Findings: | Model is strongly biased, can have quality issues such as hallucination. |
|
Risk Categories: | |
Ethical Considerations: | Data contains unfiltered content from the internet leading to biases. |
|
|
Responsible Ai Considerations |
Fairness: | Acknowledges bias in training data. |
|
Transparency: | Bias and safety acknowledged in official model card. |
|
Accountability: | Encouraging responsible AI research by making models available for study. |
|
Mitigation Strategies: | Sharing models to allow broader study and understanding of biases. |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Using top-k sampling by setting `do_sample` to `True` for non-deterministic generation. |
|
|