Bagel 2.8B V0.2 by jondurbin

 ยป  All LLMs  ยป  jondurbin  ยป  Bagel 2.8B V0.2   URL Share it on

Base model:finetune:state-spac... Base model:state-spaces/mamba-...   Dataset:ai2 arc   Dataset:boolq   Dataset:cais/mmlu   Dataset:cakiki/rosetta-code   Dataset:codeparrot/apps   Dataset:datasets/winogrande   Dataset:drop   Dataset:facebook/belebele   Dataset:intel/orca dpo pairs Dataset:jondurbin/cinematika-v...   Dataset:julielab/emobank   Dataset:kingbri/pippa-sharegpt   Dataset:ldjnr/capybara   Dataset:lmsys/lmsys-chat-1m Dataset:migtissera/synthia-v1.... Dataset:muennighoff/natural-in...   Dataset:nvidia/helpsteer   Dataset:open-orca/slimorca   Dataset:openbookqa   Dataset:piqa   Dataset:spider   Dataset:squad v2 Dataset:squish42/bluemoon-fand...   Dataset:tiger-lab/mathinstruct   Dataset:unalignment/spicy-3.1 Dataset:vezora/tested-22k-pyth...   Endpoints compatible   Pytorch   Region:us

Bagel 2.8B V0.2 Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Bagel 2.8B V0.2 (jondurbin/bagel-2.8b-v0.2)

Bagel 2.8B V0.2 Parameters and Internals

Model Type 
text generation, chatbot
Additional Notes 
The model uses a variety of prompt formats for training to improve generalization. It's experimental and may not be as effective as DPO version.
Training Details 
Data Sources:
ai2_arc, unalignment/spicy-3.1, codeparrot/apps, facebook/belebele, boolq, jondurbin/cinematika-v0.1, drop, lmsys/lmsys-chat-1m, TIGER-Lab/MathInstruct, cais/mmlu, Muennighoff/natural-instructions, openbookqa, piqa, Vezora/Tested-22k-Python-Alpaca, cakiki/rosetta-code, Open-Orca/SlimOrca, spider, squad_v2, migtissera/Synthia-v1.3, datasets/winogrande, nvidia/HelpSteer, Intel/orca_dpo_pairs, Squish42/bluemoon-fandom-1-1-rp-cleaned, LDJnr/Capybara, JULIELab/EmoBank, kingbri/PIPPA-shareGPT
Methodology:
fine-tuning with multiple prompt formats (vicuna, llama-2, alpaca, chat-ml)
Model Architecture:
MambaLMHeadModel
Input Output 
Input Format:
Supports several prompt formats including vicuna, llama-2, alpaca, and chat-ml
Accepted Modalities:
text
Output Format:
Textual responses based on prompts
Performance Tips:
Use with the recommended system prompt: 'You are a helpful, unbiased, uncensored assistant.'
LLM NameBagel 2.8B V0.2
Repository ๐Ÿค—https://huggingface.co/jondurbin/bagel-2.8b-v0.2 
Base Model(s)  state-spaces/mamba-2.8b-slimpj   state-spaces/mamba-2.8b-slimpj
Model Size2.8b
Required VRAM11.1 GB
Updated2025-02-22
Maintainerjondurbin
Model Files  11.1 GB
Model ArchitectureAutoModel
Licenseapache-2.0
Tokenizer ClassGPTNeoXTokenizer
Padding Token</s>
Vocabulary Size50277

Best Alternatives to Bagel 2.8B V0.2

Best Alternatives
Context / RAM
Downloads
Likes
Mamba 2.8B0K / 11.1 GB11608146
Mamba 2.8B Slimpj0K / 11.1 GB2268123
Synatra Mamba Ko 2.8B0K / 5.8 GB591
Mamba Chat 2.8B0K / 5.5 GB493
Mamba 2.8B Instruct Openhermes0K / 5.5 GB8371
Mamba 2.8B CyberSec0K / 5.5 GB449
Bagel DPO 2.8B V0.20K / 11.1 GB1220
Mamba 2.8B Chat No Robots0K / 5.5 GB4615
...a Financial Headline Sentiment0K / 5.5 GB51
Ct2fast Pythia 2.8B0K / 5.5 GB51
Note: green Score (e.g. "73.2") means that the model is better than jondurbin/bagel-2.8b-v0.2.

Rank the Bagel 2.8B V0.2 Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 43470 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227