Batch1 Epochs1 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 by caisarl76

 ยป  All LLMs  ยป  caisarl76  ยป  Batch1 Epochs1 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32   URL Share it on

  32bit   Autotrain compatible Base model:finetune:mistralai/... Base model:mistralai/mistral-7...   Conversational   Endpoints compatible   Generated from trainer   Instruct   Llama   Quantized   Region:us   Safetensors   Sharded   Tensorflow

Batch1 Epochs1 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Batch1 Epochs1 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 (caisarl76/batch1_epochs1_lr1e-05_paged_adamw_32bit_cosine_length2048_warmup_0.05_max_grad1.0_grad_accu32)

Batch1 Epochs1 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 Parameters and Internals

Training Details 
Context Length:
2048
LLM NameBatch1 Epochs1 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32
Repository ๐Ÿค—https://huggingface.co/caisarl76/batch1_epochs1_lr1e-05_paged_adamw_32bit_cosine_length2048_warmup_0.05_max_grad1.0_grad_accu32 
Base Model(s)  mistralai/Mistral-7B-Instruct-v0.1   mistralai/Mistral-7B-Instruct-v0.1
Model Size7b
Required VRAM14.4 GB
Updated2025-03-18
Maintainercaisarl76
Model Typellama
Instruction-BasedYes
Model Files  4.9 GB: 1-of-3   5.0 GB: 2-of-3   4.5 GB: 3-of-3   0.0 GB
Quantization Type32bit
Model ArchitectureLlamaForCausalLM
Licenseapache-2.0
Context Length32768
Model Max Length32768
Transformers Version4.35.2
Tokenizer ClassLlamaTokenizer
Padding Token[PAD]
Vocabulary Size32000
Torch Data Typebfloat16

Best Alternatives to Batch1 Epochs1 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32

Best Alternatives
Context / RAM
Downloads
Likes
...p 0.05 Max Grad1.0 Grad Accu3232K / 14.4 GB80
...ruct Solidity Bnb 4bit Smashed16K / 4.2 GB80
...B Instruct Hf Bnb 4bit Smashed16K / 4.2 GB100
CodelLama7B Inst DPO 7K Mlx16K / 4.2 GB82
...eLlama 7B Instruct Hf 4bit MLX16K / 4.2 GB232
...6.7B Instruct 8.0bpw H8 EXL2 216K / 6.8 GB82
...6.7B Instruct 3.0bpw H6 EXL2 216K / 2.8 GB91
...coder S CL 7B 3.0bpw H6 EXL2 216K / 2.8 GB111
...coder S CL 7B 4.0bpw H6 EXL2 216K / 3.6 GB81
...coder S CL 7B 5.0bpw H6 EXL2 216K / 4.4 GB81
Note: green Score (e.g. "73.2") means that the model is better than caisarl76/batch1_epochs1_lr1e-05_paged_adamw_32bit_cosine_length2048_warmup_0.05_max_grad1.0_grad_accu32.

Rank the Batch1 Epochs1 Lr1e 05 Paged Adamw 32bit Cosine Length2048 Warmup 0.05 Max Grad1.0 Grad Accu32 Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 45269 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227