Snorkel Mistral PairRM DPO by snorkelai

 ยป  All LLMs  ยป  snorkelai  ยป  Snorkel Mistral PairRM DPO   URL Share it on

  Arxiv:2305.18290   Arxiv:2306.02561   Arxiv:2312.11456   Arxiv:2401.10020   Autotrain compatible   Conversational Dataset:snorkelai/snorkel-mist...   Endpoints compatible   Mistral   Pytorch   Region:us   Sharded

Snorkel Mistral PairRM DPO Benchmarks

Snorkel Mistral PairRM DPO (snorkelai/Snorkel-Mistral-PairRM-DPO)

Snorkel Mistral PairRM DPO Parameters and Internals

Model Type 
text-generation
Use Cases 
Limitations:
The model is a quick demonstration and does not have any moderation mechanisms.
Additional Notes 
For enterprise use cases, additional fine-tuning and alignment are necessary. Interested parties can contact Snorkel AI for specialized reward models.
Training Details 
Data Sources:
snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset, UltraFeedback
Methodology:
1. Generate five response variations for each prompt from a subset of 20,000 using the LLM - to start, we used [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2). 2. Apply [PairRM](https://huggingface.co/llm-blender/PairRM) for response reranking. 3. Update the LLM by applying Direct Preference Optimization (DPO) on the top (chosen) and bottom (rejected) responses. 4. Use this LLM as the base model for the next iteration, repeating three times in total.
Input Output 
Input Format:
[INST] {prompt} [/INST]
Accepted Modalities:
text
Performance Tips:
The model is designed for initial trials and may take time initially to activate on Hugging Face endpoint.
Release Notes 
Version:
GGUF
Notes:
Model version available from community members.
LLM NameSnorkel Mistral PairRM DPO
Repository ๐Ÿค—https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO 
Required VRAM14.4 GB
Updated2025-02-22
Maintainersnorkelai
Model Typemistral
Model Files  9.9 GB: 1-of-2   4.5 GB: 2-of-2   0.0 GB
Model ArchitectureMistralForCausalLM
Licenseapache-2.0
Context Length32768
Model Max Length32768
Transformers Version4.34.0
Tokenizer ClassLlamaTokenizer
Padding Token</s>
Vocabulary Size32000
Torch Data Typebfloat16

Quantized Models of the Snorkel Mistral PairRM DPO

Model
Likes
Downloads
VRAM
...dle Snorkel Mistral PairRM DPO01014 GB

Best Alternatives to Snorkel Mistral PairRM DPO

Best Alternatives
Context / RAM
Downloads
Likes
Krutrim 2 Instruct1000K / 49.3 GB98025
Ft V1 Violet1000K / 24.5 GB4580
Ft V1 Nemo Base1000K / 24.5 GB2120
Tiny Random MistralForCausalLM128K / 0 GB36991
Winterreise M732K / 14.4 GB00
Frostwind V2.1 M732K / 14.4 GB00
...ydaz Web AI Reasoner BaseModel32K / 14.4 GB01
MistralLite32K / 14.4 GB4078428
Tess XS V1.3 Yarn 128K32K / 14.5 GB583413
Mixtral AI Cyber Child32K / 14.5 GB141
Note: green Score (e.g. "73.2") means that the model is better than snorkelai/Snorkel-Mistral-PairRM-DPO.

Rank the Snorkel Mistral PairRM DPO Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 43470 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227