GPT Sw3 126M By AI-Sweden-Models: Benchmarks, Features and Detailed Analysis. Insights on GPT Sw3 126M.

Litteraturbanken, The Pile, Diva, PubMed, ArXiv, CodeParrot, Familjeliv, Flashback, Parlai, Pushshift.io Reddit dataset, English Math dataset from DeepMind, Swedish Math dataset, OPUS, Movie scripts, Natural Instructions, P3, Norwegian Colossal Corpus, Danish Gigaword, Icelandic Gigaword, Common Crawl, LES, Multilingual C4, OSCAR, Open Web Text, Various public Swedish website scrapes, JobTech/Arbetsförmedlingen, Wikipedia

Data Volume:

1.1TB of UTF-8 encoded text containing 660M documents with a total of 320B tokens

Methodology:

Pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.

Model Architecture:

Decoder-only transformer

LLM Name	GPT Sw3 126M
Repository 🤗	https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m
Model Size	126m
Required VRAM	0.6 GB
Updated	2025-02-05
Maintainer	AI-Sweden-Models
Model Type	gpt2
Model Files	0.6 GB 0.6 GB
Supported Languages	da sv no en is
Model Architecture	GPT2LMHeadModel
License	other
Transformers Version	4.25.0.dev0
Tokenizer Class	GPTSw3Tokenizer
Vocabulary Size	64000
Torch Data Type	float32
Activation Function	gelu

Best Alternatives to GPT Sw3 126M

Best Alternatives	Context / RAM	Downloads	Likes
GPT Sw3 126M Instruct	0K / 0.6 GB	2231	3

Note: green Score (e.g. "73.2") means that the model is better than AI-Sweden-Models/gpt-sw3-126m.

Rank the GPT Sw3 126M Capabilities

🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟

Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation

What open-source LLMs or SLMs are you in search of? 42577 in total.

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241227

Support LLM Explorer

GPT Sw3 126M by AI-Sweden-Models

» All LLMs » AI-Sweden-Models » GPT Sw3 126M URL Share it on

GPT Sw3 126M Benchmarks

GPT Sw3 126M Parameters and Internals

Best Alternatives to GPT Sw3 126M

Rank the GPT Sw3 126M Capabilities

What open-source LLMs or SLMs are you in search of? 42577 in total.