What open-source LLMs or SLMs are you in search of? 36644 in total.
Quantized model
Quantization in the context of Large Language Models (LLMs) is a technique applied to reduce the model size and speed up inference times while maintaining performance. It involves converting the floating-point weights of a neural network into lower-bit representations, typically 2-bit, 3-bit, or 4-bit formats. This process allows for more efficient storage and computation in the context of the quantized llm, making it particularly beneficial for deploying LLMs. By transforming each weight matrix's floating-point parameters into quantized integers, quantization reduces the computational complexity and memory usage. This method is designed to minimize output errors and can be applied post-training, enabling even models with billions of parameters to be effectively compressed with minimal impact on accuracy. Quantization is essential for deploying complex models in resource-constrained environments and can significantly accelerate inference times, often by multiple folds. The popular quantization formats are GPTQ, GGUF, GGML, EXL2, AWQ.
Loading a list of LLMs...
Model Name | Maintainer | Size | Score | VRAM (GB) | Quantized | License | Context Len | Likes | Downloads | Modified | Languages | Architectures |
---|
— Large Language Model
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts
Choose another global filter
All Large Language Models
LMSYS ChatBot Arena ELO
OpenLLM LeaderBoard v1
OpenLLM LeaderBoard v2
Original & Foundation LLMs
OpenCompass LeaderBoard
Recently Added Models
Code Generating Models
Instruction-Based LLMs
Uncensored LLMs
LLMs Fit in 4GB RAM
LLMs Fit in 8GB RAM
LLMs Fit in 12GB RAM
LLMs Fit in 24GB RAM
LLMs Fit in 32GB RAM
GGUF Quantized Models
GPTQ Quantized Models
EXL2 Quantized Models
Fine-Tuned Models
LLMs for Commercial Use
TheBloke's Models
Context Size >16K Tokens
Mixture-Of-Experts Models
Apple's MLX LLMs
Small Language Models
Original data from HuggingFace, OpenCompass and various public git repos.
Release v2024072803