Batch1 Epochs4 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 by caisarl76

 ยป  All LLMs  ยป  caisarl76  ยป  Batch1 Epochs4 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32   URL Share it on

  Merged Model   32bit   Autotrain compatible Base model:finetune:mnc-llm/mi... Base model:mnc-llm/mistral-7b-...   Conversational   Endpoints compatible   Generated from trainer   Instruct   Llama   Quantized   Region:us   Safetensors   Sharded   Tensorflow

Batch1 Epochs4 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").

Batch1 Epochs4 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 Parameters and Internals

LLM NameBatch1 Epochs4 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32
Repository ๐Ÿค—https://huggingface.co/caisarl76/batch1_epochs4_lr1e-05_paged_adamw_32bit_cosine_length2048_warmup_0.05_max_grad1.0_grad_accu32 
Base Model(s)  MNC-LLM/Mistral-7B-LaAdMoAl-merge-Marcoroni   MNC-LLM/Mistral-7B-LaAdMoAl-merge-Marcoroni
Merged ModelYes
Model Size7b
Required VRAM14.4 GB
Updated2024-09-07
Maintainercaisarl76
Model Typellama
Instruction-BasedYes
Model Files  4.9 GB: 1-of-3   5.0 GB: 2-of-3   4.5 GB: 3-of-3   0.0 GB
Quantization Type32bit
Model ArchitectureLlamaForCausalLM
Context Length32768
Model Max Length32768
Transformers Version4.36.2
Tokenizer ClassLlamaTokenizer
Padding Token[PAD]
Vocabulary Size32000
Torch Data Typebfloat16
Batch1 Epochs4 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 (caisarl76/batch1_epochs4_lr1e-05_paged_adamw_32bit_cosine_length2048_warmup_0.05_max_grad1.0_grad_accu32)

Best Alternatives to Batch1 Epochs4 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32

Best Alternatives
Context / RAM
Downloads
Likes
...p 0.05 Max Grad1.0 Grad Accu3232K / 14.4 GB60
...ruct Solidity Bnb 4bit Smashed16K / 4.2 GB50
...B Instruct Hf Bnb 4bit Smashed16K / 4.2 GB160
CodelLama7B Inst DPO 7K Mlx16K / 4.2 GB82
...eLlama 7B Instruct Hf 4bit MLX16K / 4.2 GB31
...6.7B Instruct 3.0bpw H6 EXL2 216K / 2.8 GB141
...6.7B Instruct 8.0bpw H8 EXL2 216K / 6.8 GB112
...coder S CL 7B 3.0bpw H6 EXL2 216K / 2.8 GB141
...coder S CL 7B 4.0bpw H6 EXL2 216K / 3.6 GB141
...coder S CL 7B 5.0bpw H6 EXL2 216K / 4.4 GB141
Note: green Score (e.g. "73.2") means that the model is better than caisarl76/batch1_epochs4_lr1e-05_paged_adamw_32bit_cosine_length2048_warmup_0.05_max_grad1.0_grad_accu32.

Rank the Batch1 Epochs4 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 35693 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v2024072803