Model Type | |
Use Cases |
Limitations: | Model not aligned to generate safe completions., Can produce problematic outputs when prompted. |
|
|
Additional Notes | Training hyperparameters: learning_rate: 5e-07, total_train_batch_size: 32, optimizer: Adam with betas=(0.9,0.999), epsilon=1e-08, lr_scheduler_type: linear, lr_scheduler_warmup_ratio: 0.1, num_epochs: 3.0 |
|
Supported Languages | |
Training Details |
Data Sources: | allenai/tulu-2.5-preference-data, allenai/tulu-v2-sft-mixture |
|
Data Volume: | |
Methodology: | DPO and PPO starting from the Tulu 2 suite |
|
|
Input Output |
Input Format: | <|user|> Your message here! <|assistant|> (followed by a newline) |
|
Performance Tips: | Format all inputs with the specified pattern for quality generation. |
|
|