OCRonos Vintage by PleIAs

 ยป  All LLMs  ยป  PleIAs  ยป  OCRonos Vintage   URL Share it on

  Archives   Autotrain compatible   En   Endpoints compatible   Gpt2   History   Ocr   Ocr-correction   Pre-train   Region:us   Safetensors   Slm   Text-correction
Model Card on HF ๐Ÿค—: https://huggingface.co/PleIAs/OCRonos-Vintage 

OCRonos Vintage Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
OCRonos Vintage (PleIAs/OCRonos-Vintage)

OCRonos Vintage Parameters and Internals

Model Type 
text-generation, OCR correction
Use Cases 
Areas:
cultural heritage archives
Applications:
OCR correction
Primary Use Cases:
Historical text correction
Limitations:
Not reliable for modern concept correction
Considerations:
Performs well for documents published between the mid-19th century and mid-20th century
Additional Notes 
Can simulate historical text generation. Historical pre-training limits exposure to modern content.
Supported Languages 
en (comparable to GPT-4 or llama for English-speaking archives)
Training Details 
Data Sources:
Library of Congress, Internet Archive, Hathi Trust
Data Volume:
18 billion tokens
Methodology:
Pre-trained with llm.c
Context Length:
1024
Training Time:
2.5 hours
Hardware Used:
4 H100s
Model Architecture:
GPT-2 tokenizer, future custom tokenizer for better performance
Input Output 
Input Format:
### Text ###
Accepted Modalities:
text
Output Format:
### Correction ###
LLM NameOCRonos Vintage
Repository ๐Ÿค—https://huggingface.co/PleIAs/OCRonos-Vintage 
Model Size124.4m
Required VRAM0.2 GB
Updated2024-12-21
MaintainerPleIAs
Model Typegpt2
Model Files  0.2 GB
Supported Languagesen
Model ArchitectureGPT2LMHeadModel
Licenseapache-2.0
Model Max Length1024
Transformers Version4.43.3
Tokenizer ClassGPT2Tokenizer
Vocabulary Size50257
Torch Data Typebfloat16
Activation Functiongelu_new
Errorsreplace

Best Alternatives to OCRonos Vintage

Best Alternatives
Context / RAM
Downloads
Likes
PlayPart AI Personal Trainer0K / 0.5 GB3010
Testmod0K / 0.5 GB3700
Gpt2 Scratch0K / 0.5 GB3700
Originos Icn Savant0K / 0.5 GB3671
DialoGPT Small Garycoleman0K / 0.5 GB3651
D2nwg Causal Gpt2 V10K / 0.2 GB240
Quble Test Model V1 Pretrain0K / 0.5 GB3812
D2nwg Causal Gpt20K / 0.2 GB210
DialoGPT Medium Loki0K / 0.5 GB5200
Ftgpt0K / 0.2 GB220
Note: green Score (e.g. "73.2") means that the model is better than PleIAs/OCRonos-Vintage.

Rank the OCRonos Vintage Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 40013 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241217