Quantized LLMs

What open-source LLMs or SLMs are you in search of? 38149 in total.

Quantized model Was this list helpful?

Quantization in the context of Large Language Models (LLMs) is a technique applied to reduce the model size and speed up inference times while maintaining performance. It involves converting the floating-point weights of a neural network into lower-bit representations, typically 2-bit, 3-bit, or 4-bit formats. This process allows for more efficient storage and computation in the context of the quantized llm, making it particularly beneficial for deploying LLMs. By transforming each weight matrix's floating-point parameters into quantized integers, quantization reduces the computational complexity and memory usage. This method is designed to minimize output errors and can be applied post-training, enabling even models with billions of parameters to be effectively compressed with minimal impact on accuracy. Quantization is essential for deploying complex models in resource-constrained environments and can significantly accelerate inference times, often by multiple folds. The popular quantization formats are GPTQ, GGUF, GGML, EXL2, AWQ.

Model Size

Model VRAM

Loading a list of LLMs...

Model Name	Maintainer	Size	Score	VRAM (GB)	Quantized	License	Context Len	Likes	Downloads	Modified	Languages	Architectures

— Large Language Model
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts

LLM Explorer "Score" is the dynamically calculated score depending on the various parameters. Read more...

Table Headers Explained

Choose another global filter

All Large Language Models LMSYS ChatBot Arena ELO OpenLLM LeaderBoard v1 OpenLLM LeaderBoard v2 Original & Foundation LLMs OpenCompass LeaderBoard Recently Added Models Code Generating Models Instruction-Based LLMs Uncensored LLMs LLMs Fit in 4GB RAM LLMs Fit in 8GB RAM LLMs Fit in 12GB RAM LLMs Fit in 24GB RAM LLMs Fit in 32GB RAM GGUF Quantized Models GPTQ Quantized Models EXL2 Quantized Models Fine-Tuned Models LLMs for Commercial Use TheBloke's Models Context Size >16K Tokens Mixture-Of-Experts Models Apple's MLX LLMs Small Language Models

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241110

Support LLM Explorer