Model Type | |
Use Cases |
Areas: | |
Applications: | summarization, text classification, extraction, question-answering |
|
Primary Use Cases: | baseline to create specialized models |
|
Limitations: | Not undergone any safety alignment, may produce problematic outputs., Potential increased susceptibility to hallucination due to model size. |
|
Considerations: | Community urged to use the model with ethical intentions. |
|
|
Supported Languages | English (supported), German (supported), Spanish (supported), French (supported), Japanese (supported), Portuguese (supported), Arabic (supported), Czech (supported), Italian (supported), Korean (supported), Dutch (supported), Chinese (supported) |
|
Training Details |
Data Sources: | web, code, academic sources, books, math data, multilingual, instruction data |
|
Data Volume: | 12 trillion tokens for Stage 1 and 2 trillion tokens for Stage 2 |
|
Methodology: | Two-stage training strategy |
|
Context Length: | |
Hardware Used: | IBM's super computing cluster, Blue Vela, NVIDIA H100 GPUs |
|
Model Architecture: | Decoder-only dense transformer architecture with GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings |
|
|
Responsible Ai Considerations |
Fairness: | Involves awareness of bias and fairness. |
|
Mitigation Strategies: | Ongoing research to address and mitigate issues. |
|
|
Input Output |
Input Format: | Tokenized input using AutoTokenizer |
|
Accepted Modalities: | |
Output Format: | Decodes output tokens into text using AutoTokenizer |
|
Performance Tips: | Use appropriate libraries and follow examples provided. |
|
|