What open-source LLMs or SLMs are you in search of? 43267 in total.
Quantized models in GPTQ Format
GPTQ is a quantization method tailored for Generative Pre-trained Transformers (GPT), focusing on post-training quantization. This technique employs a unique one-shot weight quantization approach, utilizing approximate second-order information. It enables GPT models, even those with as many as 175 billion parameters, to be effectively compressed to 3 or 4 bits per weight. This compression is achieved with minimal impact on accuracy compared to the original, uncompressed model. GPTQ operates by individually quantizing the weights, transforming each weight matrix's floating-point parameters into quantized integers. This method is designed to reduce output errors and is versatile, supporting 2-bit, 3-bit, or 4-bit formats. This flexibility allows for the efficient deployment of GPT models. GPTQ is especially advantageous for extremely large models where extensive training or fine-tuning might be impractical. Additionally, it has demonstrated the ability to speed up inference times by approximately 3.25 times on high-performance GPUs.
Loading a list of LLMs...

Model Name | Maintainer | Size | Score | VRAM (GB) | Quantized | License | Context Len | Likes | Downloads | Modified | Languages | Architectures |
---|
— Large Language Model
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts
LLM Explorer "Score" is the dynamically calculated score depending on the various parameters. Read more...
Table Headers ExplainedChoose another global filter
All Large Language Models
LMSYS ChatBot Arena ELO
OpenLLM LeaderBoard v1
OpenLLM LeaderBoard v2
Original & Foundation LLMs
OpenCompass LeaderBoard
Recently Added Models
Code Generating Models
Instruction-Based LLMs
LLMs Fit in 4GB RAM
LLMs Fit in 8GB RAM
LLMs Fit in 12GB RAM
LLMs Fit in 24GB RAM
LLMs Fit in 32GB RAM
GGUF Quantized Models
GPTQ Quantized Models
EXL2 Quantized Models
Fine-Tuned Models
LLMs for Commercial Use
TheBloke's Models
Context Size >16K Tokens
Mixture-Of-Experts Models
Apple's MLX LLMs
Small Language Models
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227