Salamandra 2B by BSC-LT

 ยป  All LLMs  ยป  BSC-LT  ยป  Salamandra 2B   URL Share it on

  Arxiv:1803.09010   Arxiv:1810.06694   Arxiv:1906.03741   Arxiv:1911.05507   Arxiv:2101.00027   Arxiv:2207.00220   Arxiv:2402.06619   Arxiv:2403.14009   Arxiv:2403.20266   Arxiv:2406.17557   Autotrain compatible   Bg   Ca   Code   Cs   Cy   Da   De   El   En   Endpoints compatible   Es   Et   Eu   Fi   Fr   Ga   Gl   Hr   Hu   It   Llama   Lt   Lv   Mt   Nl   Nn   Oc   Pl   Pt   Region:us   Ro   Ru   Safetensors   Sh   Sk   Sl   Sr   Sv   Uk
Model Card on HF ๐Ÿค—: https://huggingface.co/BSC-LT/salamandra-2b 

Salamandra 2B Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Salamandra 2B (BSC-LT/salamandra-2b)

Salamandra 2B Parameters and Internals

Model Type 
text generation
Use Cases 
Areas:
Research and commercial applications
Applications:
Language generation, Instruction-tuned for general-purpose assistant tasks
Primary Use Cases:
Text generation in various domains, fine-tuning for specific use-cases
Limitations:
Not intended for malicious activities, must comply with laws and regulations
Considerations:
Developers must perform safety testing and bias reduction tailored to applications.
Additional Notes 
Model trained with over-representation of certain languages and scaling challenges.
Supported Languages 
bg (Supported), ca (Supported), code (Supported), cs (Supported), cy (Supported), da (Supported), de (Supported), el (Supported), en (Supported), es (Supported), et (Supported), eu (Supported), fi (Supported), fr (Supported), ga (Supported), gl (Supported), hr (Supported), hu (Supported), it (Supported), lt (Supported), lv (Supported), mt (Supported), nl (Supported), nn (Supported), no (Supported), oc (Supported), pl (Supported), pt (Supported), ro (Supported), ru (Supported), sh (Supported), sk (Supported), sl (Supported), sr (Supported), sv (Supported), uk (Supported)
Training Details 
Data Sources:
Common Crawl, GitHub, Wikimedia, EurLex, Spanish Crawling
Data Volume:
33TB of pre-processed text (7.8 trillion tokens)
Methodology:
Transformer-based decoder-only model
Context Length:
8192
Training Time:
Training timeline not specified
Hardware Used:
MareNostrum 5
Model Architecture:
24 layers, hidden size of 2048, 16 attention heads, SwiGLU activation, RMS Norm layer normalization
Responsible Ai Considerations 
Fairness:
Bias testing using BBQ dataset in English and Regard dataset
Mitigation Strategies:
Post training phase adjustments recommended to address biases
LLM NameSalamandra 2B
Repository ๐Ÿค—https://huggingface.co/BSC-LT/salamandra-2b 
Model Size2b
Required VRAM4.5 GB
Updated2024-12-21
MaintainerBSC-LT
Model Typellama
Model Files  4.5 GB
Supported Languagesbg ca code cs cy da de el en es et eu fi fr ga gl hr hu it lt lv mt nl nn oc pl pt ro ru sh sk sl sr sv uk
Model ArchitectureLlamaForCausalLM
Licenseapache-2.0
Context Length8192
Model Max Length8192
Transformers Version4.41.1
Tokenizer ClassLlamaTokenizer
Vocabulary Size256000
Torch Data Typebfloat16

Best Alternatives to Salamandra 2B

Best Alternatives
Context / RAM
Downloads
Likes
Llama 2B Hf 32768 Fpf32K / 3.8 GB5271
...icpm 2B Sft Bf16 Llamafied 16K16K / 6 GB5601
SmolLM2 MedIT Upscale 2B8K / 4.2 GB574
Sarvam 2B V0.58K / 5.1 GB121582
Llama3 2B Base8K / 4.7 GB4451
Test Quantized8K / 5.8 GB110
EPFL TA Meister Quantized V18K / 5.8 GB130
Llama3 Rommie8K / 5.8 GB160
...ta Llama 3 2B Mlp Layer Pruned8K / 5.1 GB410
Vinallama 2B Custom Ver24K / 5.6 GB830
Note: green Score (e.g. "73.2") means that the model is better than BSC-LT/salamandra-2b.

Rank the Salamandra 2B Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 40013 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241217