Llama 3 70B Instruct Gradient 1048K AWQ By starsy: Benchmarks, Features and Detailed Analysis. Insights on Llama 3 70B Instruct Gradient 1048K AWQ.

Arxiv:2305.14233 Arxiv:2309.00071 Arxiv:2310.05209 Arxiv:2402.08268 4-bit Autotrain compatible Awq Conversational En Endpoints compatible Instruct Llama Llama-3 Meta Quantized Region:us Safetensors Sharded Tensorflow

Model Card on HF 🤗: https://huggingface.co/starsy/Llama-3-70B-Instruct-Gradient-1048k-AWQ

Llama 3 70B Instruct Gradient 1048K AWQ Benchmarks

ARC: 66.81 vs 96.7 (so35)^-30.9%

HellaSwag: 85.46 vs 95.3 (gpt4)^-10.3%

MMLU: 76.37 vs 88.3 (so35)^-13.5%

TruthfulQA: 53.73 vs 59 (gpt4)^-8.9%

WinoGrande: 82.64 vs 87.5 (gpt4)^-5.6%

GSM8K: 78.85 vs 96.4 (so35)^-18.2%

LLME Score: 0.1772

^nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").

What is the LLM Explorer Rank (Score)

Llama 3 70B Instruct Gradient 1048K AWQ (starsy/Llama-3-70B-Instruct-Gradient-1048k-AWQ)

Llama 3 70B Instruct Gradient 1048K AWQ Parameters and Internals

Model Type

text generation, instruction-tuned

Use Cases

Areas:

commercial applications, research

Applications:

assistant-like chat

Primary Use Cases:

Natural language generation tasks

Limitations:

Use only in accordance with laws

Considerations:

Safety testing and tuning recommended

Additional Notes

Uses proprietary EasyContext Blockwise RingAttention library for long context training.

Supported Languages

English (Native)

Training Details

Data Sources:

SlimPajama, UltraChat, publicly available instruction datasets

Data Volume:

15 trillion tokens

Methodology:

NTK-aware interpolation

Context Length:

1048000

Training Time:

100-516 minutes per context length

Hardware Used:

NVIDIA L40S, H100-80GB

Model Architecture:

auto-regressive transformer architecture

Safety Evaluation

Methodologies:

red-teaming, adversarial evaluations, CyberSecEval

Findings:

residual risks remain, over-refusal reduced

Risk Categories:

CBRNE, Cyber Security, Child Safety

Ethical Considerations:

Open approach for better, safer products

Responsible Ai Considerations

Fairness:

Access to many different backgrounds

Transparency:

Open source approach

Accountability:

Developers responsible for safe use

Mitigation Strategies:

Meta Llama Guard and Code Shield safeguards

Input Output

Input Format:

Text-only

Accepted Modalities:

text

Output Format:

Text and code

Performance Tips:

Not provided

Release Notes

Version:

April 18, 2024

Date:

April 18, 2024

Notes:

Initial release of Llama 3 model

LLM Name	Llama 3 70B Instruct Gradient 1048K AWQ
Repository 🤗	https://huggingface.co/starsy/Llama-3-70B-Instruct-Gradient-1048k-AWQ
Base Model(s)	... 3 70B Instruct Gradient 1048K gradientai/Llama-3-70B-Instruct-Gradient-1048k
Model Size	70b
Required VRAM	39.9 GB
Updated	2025-05-31
Maintainer	starsy
Model Type	llama
Instruction-Based	Yes
Model Files	5.0 GB: 1-of-9 4.9 GB: 2-of-9 4.9 GB: 3-of-9 4.9 GB: 4-of-9 4.9 GB: 5-of-9 4.9 GB: 6-of-9 4.9 GB: 7-of-9 3.4 GB: 8-of-9 2.1 GB: 9-of-9
Supported Languages	en
AWQ Quantization	Yes
Quantization Type	awq
Model Architecture	LlamaForCausalLM
License	llama3
Context Length	1048576
Model Max Length	1048576
Transformers Version	4.40.2
Tokenizer Class	PreTrainedTokenizerFast
Vocabulary Size	128256
Torch Data Type	float16

Best Alternatives to Llama 3 70B Instruct Gradient 1048K AWQ

Best Alternatives	Context / RAM	Downloads	Likes
...70B Instruct Gradient 262K AWQ	256K / 39.9 GB	10	0
Llama 3.3 70B Instruct AWQ	128K / 39.9 GB	93409	5
Llama 3.3 70B Instruct AWQ	128K / 39.9 GB	43621	32
...lama 3.3 70B Instruct AWQ INT4	128K / 39.9 GB	6980	24
... SauerkrautLM 70B Instruct AWQ	128K / 39.9 GB	14	4
Llama 3 70B Instruct AWQ	8K / 39.9 GB	23220	68
...ama 3 70B Instruct AWQ Smashed	8K / 39.9 GB	3245	9
...Typhoon V1.5x 70B Instruct AWQ	8K / 39.9 GB	293	2
Meta Llama 3 70B Instruct AWQ	8K / 39.9 GB	21	1
Llama 3 70B Instruct AWQ	8K / 39.9 GB	12	1

Note: green Score (e.g. "73.2") means that the model is better than starsy/Llama-3-70B-Instruct-Gradient-1048k-AWQ.

Rank the Llama 3 70B Instruct Gradient 1048K AWQ Capabilities

🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟

Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation

What open-source LLMs or SLMs are you in search of? 47753 in total.

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241227

Support LLM Explorer

Llama 3 70B Instruct Gradient 1048K AWQ by starsy

» All LLMs » starsy » Llama 3 70B Instruct Gradient 1048K AWQ URL Share it on

Llama 3 70B Instruct Gradient 1048K AWQ Benchmarks

Llama 3 70B Instruct Gradient 1048K AWQ Parameters and Internals

Best Alternatives to Llama 3 70B Instruct Gradient 1048K AWQ

Rank the Llama 3 70B Instruct Gradient 1048K AWQ Capabilities

What open-source LLMs or SLMs are you in search of? 47753 in total.