SambaLingo Arabic Base by sambanovasystems

 ยป  All LLMs  ยป  sambanovasystems  ยป  SambaLingo Arabic Base   URL Share it on

  Arxiv:2311.05741   Arxiv:2404.05829   Ar   Autotrain compatible   Dataset:uonlp/culturax   En   Endpoints compatible   Llama   Pytorch   Region:us   Safetensors   Sharded   Tensorflow

SambaLingo Arabic Base Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
SambaLingo Arabic Base (sambanovasystems/SambaLingo-Arabic-Base)

SambaLingo Arabic Base Parameters and Internals

Model Type 
Language Model
Use Cases 
Areas:
Research, Commercial applications
Limitations:
Hallucination, Code Switching, Repetition, Coding and Math limitations, Toxicity
Additional Notes 
SambaLingo extends the Llama model's vocabulary to adapt to Arabic and English. It comes with limitations like hallucination, code switching, repetition, and potential toxicity. The model should not be used in mission-critical applications, applications involving safety, or important decision-making. A chat-oriented variant is available for direct interaction query responses.
Supported Languages 
Arabic (Fluent), English (Fluent)
Training Details 
Data Sources:
Cultura-X dataset
Data Volume:
63 billion tokens
Methodology:
Adapting language models to new languages with a 75% focus on the target language and 25% on English.
Context Length:
4096
Input Output 
Input Format:
Pretrained checkpoint with few-shot prompting with exemplars
Accepted Modalities:
text
Output Format:
text
Performance Tips:
Use the chat version for direct interactions aligned with human preferences.
LLM NameSambaLingo Arabic Base
Repository ๐Ÿค—https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base 
Model Size6.9b
Required VRAM27.8 GB
Updated2025-02-22
Maintainersambanovasystems
Model Typellama
Model Files  10.0 GB: 1-of-3   9.8 GB: 2-of-3   8.0 GB: 3-of-3   10.0 GB: 1-of-3   9.8 GB: 2-of-3   8.0 GB: 3-of-3
Supported Languagesar en
Model ArchitectureLlamaForCausalLM
Licensellama2
Context Length4096
Model Max Length4096
Transformers Version4.29.0
Tokenizer ClassLlamaTokenizer
Vocabulary Size57344
Torch Data Typefloat32

Best Alternatives to SambaLingo Arabic Base

Best Alternatives
Context / RAM
Downloads
Likes
Bolna Lead Qualification16K / 13.8 GB63
...j Pair8 Ds Coder Rmsprop Iter44K / 13.9 GB1450
... Ds Coder Reflct Rmsprop Iter44K / 13.9 GB1050
...8 Ds Chat Reflct Rmsprop Iter24K / 13.9 GB870
...s Coder Pos Reflct Adamw Iter34K / 13.9 GB1890
Ds Chat Pos Reflct Adamw Iter34K / 13.9 GB1880
...4 Ds Chat Reflct Rmsprop Iter34K / 13.9 GB1100
...Coder Pos Reflct Rmsprop Iter34K / 13.9 GB940
... Ds Coder Reflct Rmsprop Iter24K / 13.9 GB930
... Ds Coder Reflct Rmsprop Iter34K / 13.9 GB700
Note: green Score (e.g. "73.2") means that the model is better than sambanovasystems/SambaLingo-Arabic-Base.

Rank the SambaLingo Arabic Base Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 43470 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227