Smol Llama 4x220M MoE By Isotonic: Benchmarks, Features and Detailed Analysis. Insights on Smol Llama 4x220M MoE.

Autotrain compatible Bee-spoke-data/beecoder-220m-p... Bee-spoke-data/smol llama-220m... Bee-spoke-data/zephyr-220m-dpo... Bee-spoke-data/zephyr-220m-sft... Dataset:bigcode/the-stack-smol... Dataset:eleutherai/proof-pile-... Dataset:huggingfaceh4/ultracha... Dataset:huggingfaceh4/ultrafee... Dataset:jeankaddour/minipile Dataset:mattymchen/refinedweb-... Dataset:pszemraj/simple wikipe... Dataset:teknium/openhermes Endpoints compatible Lazymergekit Merge Mergekit Mixtral Moe Region:us Safetensors Sharded Tensorflow

Model Card on HF 🤗: https://huggingface.co/Isotonic/smol_llama-4x220M-MoE

Smol Llama 4x220M MoE Benchmarks

ARC: 25.09 vs 96.7 (so35)^-74.1%

HellaSwag: 29.24 vs 95.3 (gpt4)^-69.3%

MMLU: 25.88 vs 88.3 (so35)^-70.7%

TruthfulQA: 43.92 vs 59 (gpt4)^-25.6%

WinoGrande: 51.22 vs 87.5 (gpt4)^-41.5%

GSM8K: 0.15 vs 96.4 (so35)^-99.8%

LLME Score: 0.21068

^nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").

What is the LLM Explorer Rank (Score)

Smol Llama 4x220M MoE (Isotonic/smol_llama-4x220M-MoE)

Smol Llama 4x220M MoE Parameters and Internals

Model Type

Mixure of Experts (MoE), text-generation

Training Details

Data Sources:

JeanKaddour/minipile, pszemraj/simple_wikipedia_LM, mattymchen/refinedweb-3m, HuggingFaceH4/ultrachat_200k, teknium/openhermes, HuggingFaceH4/ultrafeedback_binarized, EleutherAI/proof-pile-2, bigcode/the-stack-smol-xl

Methodology:

LazyMergekit was used to create the MoE model with specific source models and their positive prompts.

LLM Name	Smol Llama 4x220M MoE
Repository 🤗	https://huggingface.co/Isotonic/smol_llama-4x220M-MoE
Model Size	595.4m
Required VRAM	1.2 GB
Updated	2025-03-14
Maintainer	Isotonic
Model Type	mixtral
Model Files	1.2 GB: 1-of-1
Model Architecture	MixtralForCausalLM
License	apache-2.0
Context Length	2048
Model Max Length	2048
Transformers Version	4.37.2
Tokenizer Class	LlamaTokenizer
Padding Token	<s>
Vocabulary Size	32128
Torch Data Type	bfloat16

Best Alternatives to Smol Llama 4x220M MoE

Best Alternatives	Context / RAM	Downloads	Likes
...inyMixtral 4x220M UniversalNER	2K / 2.4 GB	18	0

Rank the Smol Llama 4x220M MoE Capabilities

🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟

Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation

What open-source LLMs or SLMs are you in search of? 45019 in total.

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241227

Support LLM Explorer

Smol Llama 4x220M MoE by Isotonic

» All LLMs » Isotonic » Smol Llama 4x220M MoE URL Share it on

Smol Llama 4x220M MoE Benchmarks

Smol Llama 4x220M MoE Parameters and Internals

Best Alternatives to Smol Llama 4x220M MoE

Rank the Smol Llama 4x220M MoE Capabilities

What open-source LLMs or SLMs are you in search of? 45019 in total.