What open-source LLMs or SLMs are you in search of? 34447 in total.

Quantized model Was this list helpful?

Quantization in the context of Large Language Models (LLMs) is a technique applied to reduce the model size and speed up inference times while maintaining performance. It involves converting the floating-point weights of a neural network into lower-bit representations, typically 2-bit, 3-bit, or 4-bit formats. This process allows for more efficient storage and computation in the context of the quantized llm, making it particularly beneficial for deploying LLMs. By transforming each weight matrix's floating-point parameters into quantized integers, quantization reduces the computational complexity and memory usage. This method is designed to minimize output errors and can be applied post-training, enabling even models with billions of parameters to be effectively compressed with minimal impact on accuracy. Quantization is essential for deploying complex models in resource-constrained environments and can significantly accelerate inference times, often by multiple folds. The popular quantization formats are GPTQ, GGUF, GGML, EXL2, AWQ.
Model Size
Quantized model
Loading a list of LLMs...
Here comes the list of the Small and Large Language Models
Model Name Maintainer Size Score VRAM (GB) Quantized License Context Len Likes Downloads Modified Languages Architectures
— Large Language Model
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts
Table Headers Explained  
  • Name — The title and maintainer account associated with the model.
  • Params — The number of parameters used in the model.
  • Score — The model's score depending on the selected rating (default is the Open LLM Leaderboard on HuggingFace).
  • Likes — The number of "likes" given to the model by users.
  • VRAM — The number of GB required to load the model into the memory. It is not the actual required amount of RAM for inference, but could be used as a reference.
  • Downloads — The total number of downloads for the model.
  • Quantized — Specifies whether the model is quantized.
  • CodeGen — Specifies whether the model can recognize or infer source code.
  • License — The type of license associated with the model.
  • Languages — The list of languages supported by the model (where specified).
  • Maintainer — The author or maintainer of the model.
  • Architectures — The transformer architecture used in the model.
  • Context Len — The content length supported by the model.
  • Tags — The list of tags specified by the model's maintainer.

Choose another global filter

  All Large Language Models   LMSYS ChatBot Arena ELO   OpenLLM LeaderBoard v1   OpenLLM LeaderBoard v2   Original & Foundation LLMs   OpenCompass LeaderBoard   Recently Added Models   Code Generating Models   Instruction-Based LLMs   Uncensored LLMs   LLMs Fit in 4GB RAM   LLMs Fit in 8GB RAM   LLMs Fit in 12GB RAM   LLMs Fit in 24GB RAM   LLMs Fit in 32GB RAM   GGUF Quantized Models   GPTQ Quantized Models   EXL2 Quantized Models   Fine-Tuned Models   LLMs for Commercial Use   TheBloke's Models   Context Size >16K Tokens   Mixture-Of-Experts Models   Apple's MLX LLMs   Small Language Models
Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v2024072501