Mamba 7B Rw by TRI-ML

 ยป  All LLMs  ยป  TRI-ML  ยป  Mamba 7B Rw   URL Share it on

  Arxiv:2312.00752   Arxiv:2405.06640 Dataset:tiiuae/falcon-refinedw...   En   Linear   Mamba   Model-index   Openlm   Pytorch   Region:us   Safetensors
Model Card on HF ๐Ÿค—: https://huggingface.co/TRI-ML/mamba-7b-rw 

Mamba 7B Rw Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Mamba 7B Rw (TRI-ML/mamba-7b-rw)

Mamba 7B Rw Parameters and Internals

Model Type 
auto-regressive language model
Additional Notes 
This model uses a novel architecture called Mamba, which doesn't rely on self-attention like traditional transformer architectures. It shows strong performance across several benchmarks.
Supported Languages 
en (English)
Training Details 
Data Sources:
RefinedWeb dataset
Data Volume:
1.2T tokens
Methodology:
Trained using AWS SageMaker on 128 H100 80GB GPUs; Precision: bfloat16; Optimizer: AdamW; Learning rate: 3e-4
Context Length:
2048
Training Time:
three weeks
Hardware Used:
128 H100 80GB GPUs
Model Architecture:
Mamba, a state-space model that does not use self-attention
Input Output 
Input Format:
AutoTokenizer (EleutherAI/gpt-neox-20b)
Accepted Modalities:
text
Output Format:
text
LLM NameMamba 7B Rw
Repository ๐Ÿค—https://huggingface.co/TRI-ML/mamba-7b-rw 
Model Size7b
Required VRAM28.6 GB
Updated2025-02-22
MaintainerTRI-ML
Model Typemamba
Model Files  28.6 GB   28.6 GB
Supported Languagesen
Model ArchitectureMambaForCausalLM
Licenseapache-2.0
Transformers Version4.39.0.dev0
Tokenizer ClassGPTNeoXTokenizer
Padding Token<|endoftext|>
Vocabulary Size50432
Torch Data Typefloat32

Rank the Mamba 7B Rw Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 43470 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227