Model Type | Transformer-based Language Model |
|
Use Cases |
Areas: | |
Primary Use Cases: | Behavior and functionality research of large language models |
|
Limitations: | Not suitable for human-facing deployment, translation or generating text in other languages |
|
Considerations: | Conduct risk and bias assessments when using in downstream applications. |
|
|
Additional Notes | Pythia-410M is not tuned for downstream applications like commercial chatbots. |
|
Supported Languages | en (Primary language - English) |
|
Training Details |
Data Sources: | |
Data Volume: | |
Methodology: | Trained with uniform batch size of 2M tokens. Used Flash Attention. Learning rate schedule decayed to a minimum of 0.1ร maximum LR. |
|
Training Time: | 143000 steps at a batch size of 2M |
|
Model Architecture: | |
|
Responsible Ai Considerations |
Fairness: | Biases regarding gender, religion, and race documented in Section 6 of the Pile paper. |
|
Transparency: | Model outputs should not be relied upon for factual accuracy. |
|
Accountability: | Users responsible for evaluating and informing audiences about generated outputs. |
|
Mitigation Strategies: | Implement risk and bias assessments when using in downstream applications. |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Always evaluate the outputs for factual accuracy and potential biases. |
|
|
Release Notes |
Date: | |
Notes: | Pythia models were renamed and parameter counts adjusted for clarity. |
|
Version: | |
Notes: | Early version with hyperparameter discrepancies. |
|
|
|