GPTQ LLMs Quantization

What open-source LLMs or SLMs are you in search of? 34446 in total.

Quantized models in GPTQ Format Was this list helpful?

GPTQ is a quantization method tailored for Generative Pre-trained Transformers (GPT), focusing on post-training quantization. This technique employs a unique one-shot weight quantization approach, utilizing approximate second-order information. It enables GPT models, even those with as many as 175 billion parameters, to be effectively compressed to 3 or 4 bits per weight. This compression is achieved with minimal impact on accuracy compared to the original, uncompressed model. GPTQ operates by individually quantizing the weights, transforming each weight matrix's floating-point parameters into quantized integers. This method is designed to reduce output errors and is versatile, supporting 2-bit, 3-bit, or 4-bit formats. This flexibility allows for the efficient deployment of GPT models. GPTQ is especially advantageous for extremely large models where extensive training or fine-tuning might be impractical. Additionally, it has demonstrated the ability to speed up inference times by approximately 3.25 times on high-performance GPUs.

Model Size

Loading a list of LLMs...

Model Name	Maintainer	Size	Score	VRAM (GB)	Quantized	License	Context Len	Likes	Downloads	Modified	Languages	Architectures

— Large Language Model
— Adapter
— Code-Generating Model
— Listed on LMSys Arena Bot ELO Rating
— Original Model
— Merged Model
— Instruction-Based Model
— Quantized Model
— Finetuned Model
— Mixture-Of-Experts

Table Headers Explained

Choose another global filter

All Large Language Models LMSYS ChatBot Arena ELO OpenLLM LeaderBoard v1 OpenLLM LeaderBoard v2 Original & Foundation LLMs OpenCompass LeaderBoard Recently Added Models Code Generating Models Instruction-Based LLMs Uncensored LLMs LLMs Fit in 4GB RAM LLMs Fit in 8GB RAM LLMs Fit in 12GB RAM LLMs Fit in 24GB RAM LLMs Fit in 32GB RAM GGUF Quantized Models GPTQ Quantized Models EXL2 Quantized Models Fine-Tuned Models LLMs for Commercial Use TheBloke's Models Context Size >16K Tokens Mixture-Of-Experts Models Apple's MLX LLMs Small Language Models

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v2024072501

Support LLM Explorer