What open-source LLMs or SLMs are you in search of? 34447 in total.

Quantized models in AWQ Format Was this list helpful?

AWQ, or Activation-aware Weight Quantization, is a technique for compressing and accelerating large language models (LLMs) by quantizing the model's weights based on the observation that not all weights are equally important. This method takes into account the activations of the model during the quantization process, searching for optimal per-channel scaling to protect salient weights by observing the activation, not just the weights. AWQ does not rely on backpropagation or reconstruction, preserving LLMs' generalization ability on different domains and modalities. It has been shown to outperform existing work on various language modeling and domain-specific benchmarks, achieving excellent quantization performance for instruction-tuned LMs and multi-modal LMs. AWQ can significantly reduce memory requirements and latency, making it feasible to deploy large models on a wider range of devices.
Model Size
Quantized models in AWQ Format
Loading a list of LLMs...
Here comes the list of the Small and Large Language Models
Model Name Maintainer Size Score VRAM (GB) Quantized License Context Len Likes Downloads Modified Languages Architectures
— Large Language Model
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts
Table Headers Explained  
  • Name — The title and maintainer account associated with the model.
  • Params — The number of parameters used in the model.
  • Score — The model's score depending on the selected rating (default is the Open LLM Leaderboard on HuggingFace).
  • Likes — The number of "likes" given to the model by users.
  • VRAM — The number of GB required to load the model into the memory. It is not the actual required amount of RAM for inference, but could be used as a reference.
  • Downloads — The total number of downloads for the model.
  • Quantized — Specifies whether the model is quantized.
  • CodeGen — Specifies whether the model can recognize or infer source code.
  • License — The type of license associated with the model.
  • Languages — The list of languages supported by the model (where specified).
  • Maintainer — The author or maintainer of the model.
  • Architectures — The transformer architecture used in the model.
  • Context Len — The content length supported by the model.
  • Tags — The list of tags specified by the model's maintainer.

Choose another global filter

  All Large Language Models   LMSYS ChatBot Arena ELO   OpenLLM LeaderBoard v1   OpenLLM LeaderBoard v2   Original & Foundation LLMs   OpenCompass LeaderBoard   Recently Added Models   Code Generating Models   Instruction-Based LLMs   Uncensored LLMs   LLMs Fit in 4GB RAM   LLMs Fit in 8GB RAM   LLMs Fit in 12GB RAM   LLMs Fit in 24GB RAM   LLMs Fit in 32GB RAM   GGUF Quantized Models   GPTQ Quantized Models   EXL2 Quantized Models   Fine-Tuned Models   LLMs for Commercial Use   TheBloke's Models   Context Size >16K Tokens   Mixture-Of-Experts Models   Apple's MLX LLMs   Small Language Models
Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v2024072501