Phi 3.5 MoE Instruct by microsoft

 ยป  All LLMs  ยป  microsoft  ยป  Phi 3.5 MoE Instruct   URL Share it on

  Arxiv:2403.06412   Arxiv:2404.14219   Arxiv:2407.13833   Autotrain compatible   Code   Conversational   Custom code   Instruct   Moe   Multilingual   Phimoe   Region:us   Safetensors   Sharded   Tensorflow

Phi 3.5 MoE Instruct Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Phi 3.5 MoE Instruct (microsoft/Phi-3.5-MoE-instruct)

Phi 3.5 MoE Instruct Parameters and Internals

Model Type 
text generation
Use Cases 
Areas:
commercial applications, research
Applications:
general-purpose AI systems, applications requiring memory/compute constrained environments, latency-bound scenarios, strong reasoning tasks (code, math, logic)
Primary Use Cases:
language and multimodal model research, building blocks for generative AI powered features
Limitations:
not evaluated for all downstream purposes, accuracy, safety, and fairness mitigation required before specific use
Considerations:
Developers should adhere to laws and evaluate accuracy, safety, and fairness before using in high-risk scenarios.
Additional Notes 
Phi-3.5-MoE is designed for use in constrained environments and supports multilingual capabilities.
Supported Languages 
Arabic (supported), Chinese (supported), Czech (supported), Danish (supported), Dutch (supported), English (supported), Finnish (supported), French (supported), German (supported), Hebrew (supported), Hungarian (supported), Italian (supported), Japanese (supported), Korean (supported), Norwegian (supported), Polish (supported), Portuguese (supported), Russian (supported), Spanish (supported), Swedish (supported), Thai (supported), Turkish (supported), Ukrainian (supported)
Training Details 
Data Sources:
publicly available documents, high-quality educational data, synthetic "textbook-like" data, high quality chat format supervised data
Data Volume:
4.9 trillion tokens
Methodology:
supervised fine-tuning, proximal policy optimization, and direct preference optimization
Context Length:
128000
Training Time:
23 days
Hardware Used:
512 H100-80G GPUs
Model Architecture:
Mixture-of-Expert decoder-only Transformer model
Safety Evaluation 
Methodologies:
red teaming, adversarial conversation simulations, multilingual safety evaluation benchmark datasets
Findings:
Positive impact from safety post-training across multiple languages and risk categories, higher refusal rates for undesirable outputs, robustness to jailbreak techniques., Models may refuse to generate undesirable outputs in English, even when the request is in another language.
Risk Categories:
misinformation, offensive content, multilingual performance and safety gaps
Ethical Considerations:
Ensuring models do not perpetuate harmful stereotypes or generate inappropriate content.
Responsible Ai Considerations 
Fairness:
Model may under- or over-represent groups of people or reinforce negative stereotypes due to training data bias.
Transparency:
Developers should inform end-users they are interacting with an AI system.
Accountability:
Developers are responsible for testing for performance or safety gaps and implementing language-specific safeguards.
Mitigation Strategies:
Safety post-training, model fine-tuning, and adherence to legal regulations are recommended.
Input Output 
Input Format:
Chat format prompt
Accepted Modalities:
text
Output Format:
Generated text
LLM NamePhi 3.5 MoE Instruct
Repository ๐Ÿค—https://huggingface.co/microsoft/Phi-3.5-MoE-instruct 
Model Size41.9b
Required VRAM83.9 GB
Updated2025-02-14
Maintainermicrosoft
Model Typephimoe
Instruction-BasedYes
Model Files  5.0 GB: 1-of-17   5.0 GB: 2-of-17   5.0 GB: 3-of-17   5.0 GB: 4-of-17   5.0 GB: 5-of-17   5.0 GB: 6-of-17   5.0 GB: 7-of-17   5.0 GB: 8-of-17   5.0 GB: 9-of-17   5.0 GB: 10-of-17   5.0 GB: 11-of-17   5.0 GB: 12-of-17   5.0 GB: 13-of-17   5.0 GB: 14-of-17   5.0 GB: 15-of-17   5.0 GB: 16-of-17   3.9 GB: 17-of-17
Model ArchitecturePhiMoEForCausalLM
Licensemit
Context Length131072
Model Max Length131072
Transformers Version4.43.3
Tokenizer ClassLlamaTokenizer
Padding Token<|endoftext|>
Vocabulary Size32064
Torch Data Typebfloat16

Rank the Phi 3.5 MoE Instruct Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 43106 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227