GPT Sw3 126M by AI-Sweden-Models

 »  All LLMs  »  AI-Sweden-Models  »  GPT Sw3 126M   URL Share it on

  Autotrain compatible   Da   En   Endpoints compatible   Gpt2   Is   No   Pytorch   Region:us   Safetensors   Sv

GPT Sw3 126M Benchmarks

GPT Sw3 126M (AI-Sweden-Models/gpt-sw3-126m)

GPT Sw3 126M Parameters and Internals

Model Type 
large decoder-only transformer language model
Use Cases 
Areas:
research, evaluation of Nordic language capabilities
Primary Use Cases:
Research on LLMs in Nordic languages, Validation of model capabilities
Limitations:
Bias, Safety issues, Generation diversity, Hallucination, Possibility of harmful or inappropriate content
Supported Languages 
da (proficient), sv (proficient), no (proficient), en (proficient), is (proficient)
Training Details 
Data Sources:
Litteraturbanken, The Pile, Diva, PubMed, ArXiv, CodeParrot, Familjeliv, Flashback, Parlai, Pushshift.io Reddit dataset, English Math dataset from DeepMind, Swedish Math dataset, OPUS, Movie scripts, Natural Instructions, P3, Norwegian Colossal Corpus, Danish Gigaword, Icelandic Gigaword, Common Crawl, LES, Multilingual C4, OSCAR, Open Web Text, Various public Swedish website scrapes, JobTech/Arbetsförmedlingen, Wikipedia
Data Volume:
1.1TB of UTF-8 encoded text containing 660M documents with a total of 320B tokens
Methodology:
Pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.
Model Architecture:
Decoder-only transformer
LLM NameGPT Sw3 126M
Repository 🤗https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m 
Model Size126m
Required VRAM0.6 GB
Updated2025-02-05
MaintainerAI-Sweden-Models
Model Typegpt2
Model Files  0.6 GB   0.6 GB
Supported Languagesda sv no en is
Model ArchitectureGPT2LMHeadModel
Licenseother
Transformers Version4.25.0.dev0
Tokenizer ClassGPTSw3Tokenizer
Vocabulary Size64000
Torch Data Typefloat32
Activation Functiongelu

Best Alternatives to GPT Sw3 126M

Best Alternatives
Context / RAM
Downloads
Likes
GPT Sw3 126M Instruct0K / 0.6 GB22313
Note: green Score (e.g. "73.2") means that the model is better than AI-Sweden-Models/gpt-sw3-126m.

Rank the GPT Sw3 126M Capabilities

🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 42577 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227