Model Type | |
Use Cases |
Areas: | |
Limitations: | No safety guarantees for outputs |
|
Considerations: | Users should conduct thorough safety testing and implement appropriate filtering. |
|
|
Additional Notes | Supports pre-trained and instruction-tuned models with sizes varying from 270M to 3B parameters. Package includes data prep, training, fine-tuning, evaluation, checkpoints, and logs. |
|
Training Details |
Data Sources: | RefinedWeb, deduplicated PILE, subset of RedPajama, subset of Dolma v1.6 |
|
Data Volume: | |
Methodology: | layer-wise scaling strategy within transformer layers |
|
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Use appropriate batch sizes and token speculation for faster generation. |
|
|