Model Type | |
Use Cases |
Areas: | research, commercial applications |
|
Applications: | general-domain instruction following, text-based query responses |
|
Primary Use Cases: | improving helpfulness in language model responses |
|
Limitations: | not tuned for math, unknown specialized domains |
|
|
Additional Notes | Evaluation results might differ from other platforms like NeMo-Aligner. |
|
Supported Languages | |
Training Details |
Data Sources: | |
Data Volume: | |
Methodology: | Reinforcement Learning with Human Feedback (RLHF) using REINFORCE |
|
Context Length: | |
Hardware Used: | H100, A100 80GB, A100 40GB |
|
Model Architecture: | |
|
Input Output |
Input Format: | |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Use at least 2 NVIDIA Ampere 80GB GPUs for optimal performance. |
|
|
Release Notes |
Version: | |
Date: | |
Notes: | Achieved top performance on Arena Hard, AlpacaEval 2 LC, and MT Bench. |
|
|
|