Mpt 7B 8K by mosaicml

 ยป  All LLMs  ยป  mosaicml  ยป  Mpt 7B 8K   URL Share it on

  Arxiv:1909.08053   Arxiv:2010.04245   Arxiv:2108.12409   Arxiv:2205.14135   Arxiv:2302.06675   Arxiv:2302.13971   Autotrain compatible   Composer   Custom code   Dataset:allenai/s2orc   Dataset:bigcode/the-stack   Dataset:c4   Dataset:mc4 Dataset:togethercomputer/redpa...   Ext 8k   Llm-foundry   Mosaicml   Mpt   Pytorch   Region:us   Sharded   Streamingdatasets
Model Card on HF ๐Ÿค—: https://huggingface.co/mosaicml/mpt-7b-8k 

Mpt 7B 8K Benchmarks

Mpt 7B 8K (mosaicml/mpt-7b-8k)

Mpt 7B 8K Parameters and Internals

Model Type 
decoder-style transformer, LLM, multimodal
Use Cases 
Areas:
research, commercial applications
Applications:
text generation, long-form instruction following, dialogue generation
Primary Use Cases:
finetuning for specific applications
Limitations:
not intended for deployment without finetuning, can produce factually incorrect output
Considerations:
Efforts made to clean pretraining data; however, outputs may still be offensive or biased.
Additional Notes 
This model builds on the MPT-7B with longer sequence handling and significant efficiency improvements.
Supported Languages 
English (proficient)
Training Details 
Data Sources:
mc4, c4, togethercomputer/RedPajama-Data-1T, bigcode/the-stack, allenai/s2orc
Data Volume:
1.5T tokens
Methodology:
MPT-7B-8k uses a modified transformer architecture, optimized for efficient training and inference with ALiBi for handling long inputs.
Context Length:
8192
Training Time:
9.5 days
Hardware Used:
440 A100-40GB GPUs
Model Architecture:
Decoder-only transformer with modifications such as FlashAttention, ALiBi, elimination of positional embeddings.
Safety Evaluation 
Ethical Considerations:
MPT-7B-8k can produce factually incorrect, lewd, biased or offensive outputs. It should not be used for human-facing interactions without further guardrails and user consent.
Responsible Ai Considerations 
Fairness:
Model may have biases inherited from training data.
Transparency:
Pretraining data was openly available, preprocessed to remove unsuitable content.
Accountability:
Responsibility of MosaicML.
Mitigation Strategies:
Guardrails recommended before deployment.
Input Output 
Input Format:
Text sequences, up to 8k tokens
Accepted Modalities:
text
Output Format:
Generated text
Performance Tips:
Utilize optimized implementations like FlashAttention and ensure usage with bfloat16 precision on GPUs.
Release Notes 
Version:
1.0.0
Date:
2023-07-18
Notes:
Initial release of MPT-7B-8k.
LLM NameMpt 7B 8K
Repository ๐Ÿค—https://huggingface.co/mosaicml/mpt-7b-8k 
Model Size7b
Required VRAM13.3 GB
Updated2024-12-22
Maintainermosaicml
Model Typempt
Model Files  9.9 GB: 1-of-2   3.4 GB: 2-of-2
Context Length8k
Model ArchitectureMPTForCausalLM
Licenseapache-2.0
Model Max Length8192
Transformers Version4.30.2
Tokenizer ClassGPTNeoXTokenizer
Vocabulary Size50432
Torch Data Typebfloat16

Quantized Models of the Mpt 7B 8K

Model
Likes
Downloads
VRAM
Mpt 7B Q81236 GB

Best Alternatives to Mpt 7B 8K

Best Alternatives
Context / RAM
Downloads
Likes
Mpt 7B0K / 13.3 GB311371164
Mpt 7B Chat0K / 13.3 GB20372512
Mpt 7B Storywriter0K / 13.3 GB1817824
Mpt 7B Instruct0K / 13.3 GB7997468
Mpt 7B Int8 Ov0K / 0 GB120
Shears Mpt 7B 50 Base0K / 13.3 GB181
Sea Lion 7B Instruct0K / 15 GB48723
Sea Lion 7B0K / 15 GB240236
Mpt 7B0K / 26.5 GB36691
Mpt 7B 8K Instruct0K / 13.3 GB132126
Note: green Score (e.g. "73.2") means that the model is better than mosaicml/mpt-7b-8k.

Rank the Mpt 7B 8K Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 40066 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241217