Opt 125M by facebook

 ยป  All LLMs  ยป  facebook  ยป  Opt 125M   URL Share it on

  Arxiv:2005.14165   Arxiv:2205.01068   Autotrain compatible   En   Jax   Opt   Pytorch   Region:us   Tf
Model Card on HF ๐Ÿค—: https://huggingface.co/facebook/opt-125m 

Opt 125M Benchmarks

Opt 125M (facebook/opt-125m)

Opt 125M Parameters and Internals

Model Type 
text generation
Use Cases 
Areas:
Research, Commercial Applications
Applications:
Text generation, Prompting for downstream tasks
Primary Use Cases:
Text generation, Prompting for evaluation of downstream tasks
Limitations:
Bias in training data, Quality issues in generation diversity and hallucination.
Considerations:
Bias in training data can affect fine-tuned versions.
Additional Notes 
The model card discusses ethical considerations related to model biases due to the nature of the training data.
Training Details 
Data Sources:
BookCorpus, CC-Stories, The Pile, Pushshift.io Reddit, CCNewsV2
Data Volume:
180B tokens, 800GB data
Methodology:
Causal language modeling (CLM) using GPT2 byte-level BPE.
Context Length:
2048
Training Time:
33 days of continuous training
Hardware Used:
992 80GB A100 GPUs
Model Architecture:
Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers.
Safety Evaluation 
Methodologies:
Evaluation using prompts similar to GPT-3
Findings:
Model is strongly biased, can have quality issues such as hallucination.
Risk Categories:
bias, toxicity
Ethical Considerations:
Data contains unfiltered content from the internet leading to biases.
Responsible Ai Considerations 
Fairness:
Acknowledges bias in training data.
Transparency:
Bias and safety acknowledged in official model card.
Accountability:
Encouraging responsible AI research by making models available for study.
Mitigation Strategies:
Sharing models to allow broader study and understanding of biases.
Input Output 
Input Format:
Text prompts
Accepted Modalities:
text
Output Format:
Generated text
Performance Tips:
Using top-k sampling by setting `do_sample` to `True` for non-deterministic generation.
LLM NameOpt 125M
Repository ๐Ÿค—https://huggingface.co/facebook/opt-125m 
Model Size125m
Required VRAM0.3 GB
Updated2025-02-05
Maintainerfacebook
Model Typeopt
Model Files  0.3 GB
Supported Languagesen
Model ArchitectureOPTForCausalLM
Licenseother
Context Length2048
Model Max Length2048
Transformers Version4.21.0.dev0
Beginning of Sentence Token</s>
End of Sentence Token</s>
Unk Token</s>
Vocabulary Size50272
Torch Data Typefloat16
Activation Functionrelu
Errorsreplace

Best Alternatives to Opt 125M

Best Alternatives
Context / RAM
Downloads
Likes
...125M Qcqa Ub 6 Best For Q Loss2K / 0.5 GB12090
...5M Qcqa Ub 6 Best For KV Cache2K / 0.5 GB12120
...25M Gqa Ub 6 Best For KV Cache2K / 0.5 GB12110
Galactica 125M Cot2K / 0.5 GB1490
Galactica Ref2K / 0.5 GB1430
Galactica 125M DPO Pos2K / 0.5 GB1400
Galactica 125M DPO2K / 0.5 GB1400
BertQA2K / 0.5 GB1250
BertQA2K / 0.5 GB1410
Opt 125M Quantized Brevitas2K /  GB50
Note: green Score (e.g. "73.2") means that the model is better than facebook/opt-125m.

Rank the Opt 125M Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 42577 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227