Model Type | |
Use Cases |
Areas: | |
Applications: | summarization, text generation, chatbot |
|
Limitations: | Not suitable for production use without adequate assessment of risks |
|
|
Additional Notes | The model requires further fine-tuning for specific use cases and includes biases representative of web data. |
|
Supported Languages | en (English), de (German), es (Spanish), fr (French), it (Italian), nl (Dutch), pl (Polish), pt (Portuguese), ro (Romanian), cs (Czech), sv (Swedish) |
|
Training Details |
Data Sources: | RefinedWeb, RefinedWeb-English, Refined Web-Europe (cs, de, es, fr, it, nl, pl, pt, ro, sv), high quality technical data, code data, conversational data extracted from public sources |
|
Data Volume: | |
Methodology: | four stage training strategy |
|
Context Length: | |
Training Time: | |
Hardware Used: | |
Model Architecture: | Adapted from GPT-3 with rotary positional embeddings, multiquery attention, and FlashAttention2 |
|
|
Input Output |
Input Format: | Token-based input with context length up to 8192 tokens |
|
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Fine-tuning recommended for specific tasks |
|
|