Phi 3 Medium 4K Instruct By microsoft: Benchmarks, Features and Detailed Analysis. Insights on Phi 3 Medium 4K Instruct.

Autotrain compatible Code Conversational Custom code Endpoints compatible Instruct Multilingual Phi3 Region:us Safetensors Sharded Tensorflow

Model Card on HF 🤗: https://huggingface.co/microsoft/Phi-3-medium-4k-instruct

Phi 3 Medium 4K Instruct Benchmarks

LMSys ELO: 1119 vs 1272 (so35)^-12%

IFEval: 64.23 vs 88 (so35)^-27%

ARC: 67.32 vs 96.7 (so35)^-30.4%

HellaSwag: 85.76 vs 95.3 (gpt4)^-10%

MMLU: 77.83 vs 88.3 (so35)^-11.9%

TruthfulQA: 57.71 vs 59 (gpt4)^-2.2%

WinoGrande: 72.69 vs 87.5 (gpt4)^-16.9%

GSM8K: 79.38 vs 96.4 (so35)^-17.7%

MATH Lvl 5: 19.56

LLME Score: 0.43725

^nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").

What is the LLM Explorer Rank (Score)

Phi 3 Medium 4K Instruct (microsoft/Phi-3-medium-4k-instruct)

Phi 3 Medium 4K Instruct Parameters and Internals

Model Type

text generation

Use Cases

Areas:

Commercial, Research

Applications:

General AI systems, Latency and memory-constrained environments

Primary Use Cases:

Code, math, and logical reasoning

Limitations:

Potential inaccuracy, bias, and harm in high-risk scenarios

Considerations:

Models are not evaluated for all downstream use cases, especially high-risk ones.

Additional Notes

Supports cross-platform capabilities through ONNX runtime across various devices.

Supported Languages

primary (English), additional (Multilingual (10% of training data))

Training Details

Data Sources:

Publicly available documents, Human-like synthetic data

Data Volume:

4.8 trillion tokens

Methodology:

Supervised fine-tuning and Direct Preference Optimization

Context Length:

4000

Training Time:

42 days

Hardware Used:

512 H100-80G GPUs

Model Architecture:

Dense decoder-only Transformer model

Safety Evaluation

Methodologies:

Supervised fine-tuning, Direct Preference Optimization

Risk Categories:

Misinformation, Bias, Offensive content

Ethical Considerations:

Models may produce unreliable, biased, or offensive outputs.

Responsible Ai Considerations

Fairness:

Models trained on English may over/under-represent groups, or reinforce demeaning stereotypes.

Transparency:

Transparency best practices and accountability need to be applied by developers.

Accountability:

Developers should ensure compliance with relevant laws and regulations.

Mitigation Strategies:

Responsible AI best practices should be followed including the use of safety classifiers.

Input Output

Input Format:

Chat format prompts using user-assistant dialogue.

Accepted Modalities:

text

Output Format:

Generated text in response to input

Performance Tips:

Include a BOS token at conversation start for reliable results.

LLM Name	Phi 3 Medium 4K Instruct
Repository 🤗	https://huggingface.co/microsoft/Phi-3-medium-4k-instruct
Model Size	14b
Required VRAM	28 GB
Updated	2025-05-31
Maintainer	microsoft
Model Type	phi3
Instruction-Based	Yes
Model Files	4.9 GB: 1-of-6 5.0 GB: 2-of-6 4.9 GB: 3-of-6 4.8 GB: 4-of-6 4.8 GB: 5-of-6 3.6 GB: 6-of-6
Model Architecture	Phi3ForCausalLM
License	mit
Context Length	4096
Model Max Length	4096
Transformers Version	4.39.3
Tokenizer Class	LlamaTokenizer
Padding Token	<\|endoftext\|>
Vocabulary Size	32064
Torch Data Type	bfloat16

Quantized Models of the Phi 3 Medium 4K Instruct

Model	Likes	Downloads	VRAM
Phi 3 Medium 4K Instruct 8bit	1	20	14 GB
Phi 3 Medium 4K Instruct GGUF	0	48	5 GB
Phi 3 Medium 4K Instruct GGUF	0	38	5 GB

Best Alternatives to Phi 3 Medium 4K Instruct

Best Alternatives	Context / RAM	Downloads	Likes
Phi 3 Medium 128K Instruct	128K / 28 GB	9712	381
...ess V2.5 Phi 3 Medium 128K 14B	128K / 28 GB	2272	4
Finetuned Phi3 Medium 128K	128K / 28 GB	20	0
Shisa V1 Phi3 14B	128K / 28 GB	15	2
Mahou 1.2 Phi 14B	128K / 28 GB	17	1
...colatine 14B Instruct DPO V1.3	16K / 29.4 GB	16	0
Phi4 Slerp2 14B	16K / 28 GB	13	0
...colatine 14B Instruct DPO V1.2	4K / 28 GB	5059	14
Ph3della5 14B	4K / 28 GB	713	0
...ium 4K Instruct Abliterated V3	4K / 28 GB	788	24

Rank the Phi 3 Medium 4K Instruct Capabilities

🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟

Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation

What open-source LLMs or SLMs are you in search of? 47753 in total.

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241227

Support LLM Explorer