Phi 3 Medium 128K Instruct By microsoft: Benchmarks, Features and Detailed Analysis. Insights on Phi 3 Medium 128K Instruct.

Autotrain compatible Code Conversational Custom code Endpoints compatible Instruct Multilingual Phi3 Region:us Safetensors Sharded Tensorflow

Model Card on HF 🤗: https://huggingface.co/microsoft/Phi-3-medium-128k-instruct

Phi 3 Medium 128K Instruct Benchmarks

MMLU Pro: 41.24

GPQA: 11.52

MUSR: 11.35

BBH: 48.46

IFEval: 60.4 vs 88 (so35)^-31.4%

ARC: 66.47 vs 96.7 (so35)^-31.3%

HellaSwag: 84.91 vs 95.3 (gpt4)^-10.9%

MMLU: 76.75 vs 88.3 (so35)^-13.1%

TruthfulQA: 54.59 vs 59 (gpt4)^-7.5%

WinoGrande: 74.74 vs 87.5 (gpt4)^-14.6%

GSM8K: 80.52 vs 96.4 (so35)^-16.5%

MATH Lvl 5: 19.18

LLME Score: 0.34284

^nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").

What is the LLM Explorer Rank (Score)

Phi 3 Medium 128K Instruct (microsoft/Phi-3-medium-128k-instruct)

Phi 3 Medium 128K Instruct Parameters and Internals

Model Type

text-generation, code

Use Cases

Areas:

Commercial, Research

Applications:

Language and multimodal models, Generative AI

Primary Use Cases:

Memory/compute constrained environments for general AI system, Latency bound scenarios requiring strong reasoning, Research acceleration

Limitations:

Not evaluated for all downstream purposes, High risk scenarios may require debiasing techniques

Considerations:

Evaluate and mitigate accuracy, safety, and fairness before use in specific downstream scenarios.

Supported Languages

English (primary)

Training Details

Data Sources:

Publicly available documents, Newly created synthetic text data, High quality chat format supervised data

Data Volume:

4.8 trillion tokens

Methodology:

Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO)

Context Length:

128000

Training Time:

42 days

Hardware Used:

512 H100-80G GPUs

Model Architecture:

Dense decoder-only Transformer

Responsible Ai Considerations

Fairness:

Trained primarily on English text, under-representation of some English language varieties is possible.

Transparency:

Models may generate inappropriate or offensive content; transparent communication with users recommended.

Accountability:

Developers should adhere to applicable laws and verify outputs before use in high-risk scenarios.

Mitigation Strategies:

Built-in safety classifiers and custom solutions for high-risk contexts.

Input Output

Input Format:

Chat format prompts

Accepted Modalities:

Text

Output Format:

Generated text

Performance Tips:

Add a BOS token (`~~`) at the start of the conversation for reliable results.

Release Notes

Version:

May 21, 2024

Date:

2024-05-21

Notes:

Phi-3-Medium-128K-Instruct model released with 14B parameters and advanced capabilities.

LLM Name	Phi 3 Medium 128K Instruct
Repository 🤗	https://huggingface.co/microsoft/Phi-3-medium-128k-instruct
Model Size	14b
Required VRAM	28 GB
Updated	2025-05-31
Maintainer	microsoft
Model Type	phi3
Instruction-Based	Yes
Model Files	4.9 GB: 1-of-6 5.0 GB: 2-of-6 4.9 GB: 3-of-6 4.8 GB: 4-of-6 4.8 GB: 5-of-6 3.6 GB: 6-of-6
Model Architecture	Phi3ForCausalLM
License	mit
Context Length	131072
Model Max Length	131072
Transformers Version	4.39.3
Tokenizer Class	LlamaTokenizer
Padding Token	<\|endoftext\|>
Vocabulary Size	32064
Torch Data Type	bfloat16

Quantized Models of the Phi 3 Medium 128K Instruct

Model	Likes	Downloads	VRAM
...hi 3 Medium 128K Instruct GGUF	2	195	5 GB
...hi 3 Medium 128K Instruct GGUF	0	86	5 GB

Best Alternatives to Phi 3 Medium 128K Instruct

Best Alternatives	Context / RAM	Downloads	Likes
...ess V2.5 Phi 3 Medium 128K 14B	128K / 28 GB	2272	4
Finetuned Phi3 Medium 128K	128K / 28 GB	20	0
Shisa V1 Phi3 14B	128K / 28 GB	15	2
Mahou 1.2 Phi 14B	128K / 28 GB	17	1
...colatine 14B Instruct DPO V1.3	16K / 29.4 GB	16	0
Phi4 Slerp2 14B	16K / 28 GB	13	0
Phi 3 Medium 4K Instruct	4K / 28 GB	22024	220
...colatine 14B Instruct DPO V1.2	4K / 28 GB	5059	14
Ph3della5 14B	4K / 28 GB	713	0
...ium 4K Instruct Abliterated V3	4K / 28 GB	788	24

Note: green Score (e.g. "73.2") means that the model is better than microsoft/Phi-3-medium-128k-instruct.

Rank the Phi 3 Medium 128K Instruct Capabilities

🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟

Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation

What open-source LLMs or SLMs are you in search of? 47753 in total.

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241227

Support LLM Explorer