Model Type | decoder-only transformer, text generation, code generation |
|
Supported Languages | fi (fluent), en (fluent), da (fluent), sv (fluent), no (fluent), nn (fluent), is (fluent) |
|
Training Details |
Data Sources: | cerebras/SlimPajama-627B, bigcode/starcoderdata, mc4 |
|
Data Volume: | |
Methodology: | Generative pretrained transformer using a LLaMA-like GPT architecture with rotary positional embeddings and flash attention |
|
Context Length: | |
Hardware Used: | |
Model Architecture: | Decoder-only transformer with 32 layers, 32 heads, d_model=4096, vocab_size=131072, sequence_length=4096 |
|
|
Responsible Ai Considerations |
Fairness: | No meaningful proficiency in languages outside the trained ones. |
|
|