Gpt2 NoLN by apollo-research

 ยป  All LLMs  ยป  apollo-research  ยป  Gpt2 NoLN   URL Share it on

  Arxiv:2409.13710   Autotrain compatible   Endpoints compatible   Gpt2   Region:us   Safetensors

Gpt2 NoLN Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Gpt2 NoLN (apollo-research/gpt2_noLN)

Gpt2 NoLN Parameters and Internals

Model Type 
GPT2LMHeadModel
Additional Notes 
To fully remove all LayerNorms, replace 'ln_1' and 'ln_2' modules with identities, and modify 'ln_f' with adjustments to the unembed matrix and bias.
Training Details 
Data Sources:
OpenWebText
Data Volume:
~500M tokens
Methodology:
Fine-tuning with gradual LayerNorm disabling
Context Length:
1024
Release Notes 
Version:
v2
Notes:
Trained for 1000 iterations in a single training run
Version:
v1
Notes:
Trained for 900 iterations, with multiple interruptions, modifying LNs, and resume steps
LLM NameGpt2 NoLN
Repository ๐Ÿค—https://huggingface.co/apollo-research/gpt2_noLN 
Model Size124.4m
Required VRAM0.5 GB
Updated2025-02-22
Maintainerapollo-research
Model Typegpt2
Model Files  0.5 GB
Model ArchitectureGPT2LMHeadModel
Transformers Version4.42.4
Vocabulary Size50257
Torch Data Typefloat32
Activation Functiongelu_new

Best Alternatives to Gpt2 NoLN

Best Alternatives
Context / RAM
Downloads
Likes
TisaleoGPT2Bot10K / 0.5 GB830
Gpt2 Therapist Finetuned0K / 0.5 GB10100
TisaleoRuta0K / 0.5 GB220
...1 Delta2.0 LearnabilityScratch0K / 0.5 GB1270
AI Guru0K / 0.5 GB1840
Gpt2 Small III0K / 0.5 GB2082
Pop Lyrics Generator V10K / 0.5 GB2507
NeuraMed0K / 0.5 GB1900
NeoMed0K / 0.5 GB790
...2 Kgw K1 Delta2.0 LogitDistill0K / 0.5 GB510
Note: green Score (e.g. "73.2") means that the model is better than apollo-research/gpt2_noLN.

Rank the Gpt2 NoLN Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 43470 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227