What open-source LLMs or SLMs are you in search of? 39237 in total.
Quantized models in AWQ Format
AWQ, or Activation-aware Weight Quantization, is a technique for compressing and accelerating large language models (LLMs) by quantizing the model's weights based on the observation that not all weights are equally important. This method takes into account the activations of the model during the quantization process, searching for optimal per-channel scaling to protect salient weights by observing the activation, not just the weights. AWQ does not rely on backpropagation or reconstruction, preserving LLMs' generalization ability on different domains and modalities. It has been shown to outperform existing work on various language modeling and domain-specific benchmarks, achieving excellent quantization performance for instruction-tuned LMs and multi-modal LMs. AWQ can significantly reduce memory requirements and latency, making it feasible to deploy large models on a wider range of devices.
Loading a list of LLMs...
Model Name | Maintainer | Size | Score | VRAM (GB) | Quantized | License | Context Len | Likes | Downloads | Modified | Languages | Architectures |
---|
— Large Language Model
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts
LLM Explorer "Score" is the dynamically calculated score depending on the various parameters. Read more...
Table Headers ExplainedChoose another global filter
All Large Language Models
LMSYS ChatBot Arena ELO
OpenLLM LeaderBoard v1
OpenLLM LeaderBoard v2
Original & Foundation LLMs
OpenCompass LeaderBoard
Recently Added Models
Code Generating Models
Instruction-Based LLMs
Uncensored LLMs
LLMs Fit in 4GB RAM
LLMs Fit in 8GB RAM
LLMs Fit in 12GB RAM
LLMs Fit in 24GB RAM
LLMs Fit in 32GB RAM
GGUF Quantized Models
GPTQ Quantized Models
EXL2 Quantized Models
Fine-Tuned Models
LLMs for Commercial Use
TheBloke's Models
Context Size >16K Tokens
Mixture-Of-Experts Models
Apple's MLX LLMs
Small Language Models
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241124