Model Type | |
Use Cases |
Considerations: | Fine-tuning is recommended for specific tasks. |
|
|
Additional Notes | The checkpoint is the 'raw' pre-trained model and has not been tuned to a more specific task, indicating it should be fine-tuned before use in most cases. |
|
Supported Languages | |
Training Details |
Data Sources: | JeanKaddour/minipile, pszemraj/simple_wikipedia_LM, BEE-spoke-data/wikipedia-20230901.en-deduped, mattymchen/refinedweb-3m |
|
Methodology: | |
Context Length: | |
Training Time: | |
Hardware Used: | |
Model Architecture: | 768 hidden size, 6 layers, GQA (24 heads, 8 key-value) |
|
|
Release Notes |
Version: | |
Date: | |
Notes: | smol_llama-101M-GQA (First version), a small 101M param decoder model. |
|
|
|