Llama 2 7B Chat Hf FP8 by neuralmagic

 ยป  All LLMs  ยป  neuralmagic  ยป  Llama 2 7B Chat Hf FP8   URL Share it on

  Autotrain compatible   Conversational   Endpoints compatible   Fp8   Llama   Region:us   Safetensors   Sharded   Tensorflow   Vllm

Llama 2 7B Chat Hf FP8 Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Llama 2 7B Chat Hf FP8 (neuralmagic/Llama-2-7b-chat-hf-FP8)

Llama 2 7B Chat Hf FP8 Parameters and Internals

Model Type 
Chatbot, Text Generation
Use Cases 
Areas:
Commercial, Research
Applications:
Assistant-like chat
Primary Use Cases:
English Language Chatbots
Limitations:
Out-of-scope: Use in any manner that violates applicable laws or regulations, Use in languages other than English
Additional Notes 
Quantization reduces number of bits per parameter significantly enhancing resource efficiency.
Supported Languages 
English (Proficient)
Training Details 
Data Sources:
ultrachat calibration samples
Methodology:
FP8 Quantization using AutoFP8
Context Length:
4096
Hardware Used:
GPU (vLLM >= 0.5.0)
Model Architecture:
Llama-2-7b-chat-hf
Input Output 
Input Format:
Text
Accepted Modalities:
text
Output Format:
Text
Performance Tips:
Sufficient GPU memory and proper setup of vLLM backend enhances performance
Release Notes 
Version:
1.0
Date:
6/26/2024
Notes:
Quantized to FP8 for efficiency. Initial release of the model for assistant-like chat in English.
LLM NameLlama 2 7B Chat Hf FP8
Repository ๐Ÿค—https://huggingface.co/neuralmagic/Llama-2-7b-chat-hf-FP8 
Model Size7b
Required VRAM7 GB
Updated2025-02-05
Maintainerneuralmagic
Model Typellama
Model Files  5.0 GB: 1-of-2   2.0 GB: 2-of-2
Model ArchitectureLlamaForCausalLM
Licensellama2
Context Length4096
Model Max Length4096
Transformers Version4.41.2
Tokenizer ClassLlamaTokenizer
Vocabulary Size32000
Torch Data Typefloat16

Best Alternatives to Llama 2 7B Chat Hf FP8

Best Alternatives
Context / RAM
Downloads
Likes
...1M 1000000ctx AEZAKMI 3 1 17021024K / 13.5 GB531
... Qwen2.5llamaify 7B V23.1 200K195K / 15.2 GB49502
LlamaStock 8B128K / 16.1 GB151
SuperNeuralDreadDevil 8B128K / 16.1 GB311
Yarn Llama 2 7B 128K128K / 13.5 GB570539
LLaMA 7B PoSE YaRN 128K128K / 13.5 GB83
LLaMA 7B PoSE Linear 96K96K / 27 GB82
LLaMA 7B PoSE YaRN 96K96K / 13.5 GB81
Chat Llama2 7B 80K80K / 13.8 GB60
Llama2 7B 80K80K / 13.8 GB90
Note: green Score (e.g. "73.2") means that the model is better than neuralmagic/Llama-2-7b-chat-hf-FP8.

Rank the Llama 2 7B Chat Hf FP8 Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 42577 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227