Training Details |
Data Sources: | Wikipedia (English, German, Spanish, French), Project Gutenberg, 45 Subreddits, OpenWebText, news data, Amazon Reviews, Europarl and UN data from WMT, ELI5, MRQA shared tasks |
|
Data Volume: | |
Methodology: | Pre-trained using language modeling with control codes as first token |
|
Training Time: | |
Hardware Used: | |
Model Architecture: | CTRL has model dimension d = 1280, inner dimension f = 8192, 48 layers, and 16 heads per layer. Dropout with probability 0.1 follows the residual connections in each layer. |
|
|