Model Type | |
Use Cases |
Areas: | On-device computing, Instruction following |
|
Applications: | Summarization, Text rewriting, Function calling |
|
Primary Use Cases: | English content generation, Instruction following |
|
Limitations: | Models primarily understand and generate English., Content may not be factually accurate or logically consistent., Presence of biases inherent to training data. |
|
Considerations: | Models are assistive tools and should not be used as definitive information sources. |
|
|
Additional Notes | Memory footprint of the 135M model is 723.56 MB when loaded. |
|
Supported Languages | |
Training Details |
Data Sources: | FineWeb-Edu, DCLM, The Stack, UltraFeedback |
|
Data Volume: | |
Methodology: | Direct Preference Optimization (DPO), Supervised Fine-tuning (SFT) |
|
Context Length: | |
Hardware Used: | |
Model Architecture: | |
|
Input Output |
Input Format: | Token sequences encoded with a tokenizer |
|
Accepted Modalities: | |
Output Format: | Generated token sequences |
|
Performance Tips: | Use multiple GPUs and specific precision settings (e.g., bfloat16) for optimal performance |
|
|