Supported Languages | bg (Supported), ca (Supported), code (Supported), cs (Supported), cy (Supported), da (Supported), de (Supported), el (Supported), en (Supported), es (Supported), et (Supported), eu (Supported), fi (Supported), fr (Supported), ga (Supported), gl (Supported), hr (Supported), hu (Supported), it (Supported), lt (Supported), lv (Supported), mt (Supported), nl (Supported), nn (Supported), no (Supported), oc (Supported), pl (Supported), pt (Supported), ro (Supported), ru (Supported), sh (Supported), sk (Supported), sl (Supported), sr (Supported), sv (Supported), uk (Supported) |
|
Training Details |
Data Sources: | Common Crawl, GitHub, Wikimedia, EurLex, Spanish Crawling |
|
Data Volume: | 33TB of pre-processed text (7.8 trillion tokens) |
|
Methodology: | Transformer-based decoder-only model |
|
Context Length: | |
Training Time: | Training timeline not specified |
|
Hardware Used: | |
Model Architecture: | 24 layers, hidden size of 2048, 16 attention heads, SwiGLU activation, RMS Norm layer normalization |
|
|