Git Base Textvqa by microsoft

 ยป  All LLMs  ยป  microsoft  ยป  Git Base Textvqa   URL Share it on

  Arxiv:2205.14100   Autotrain compatible   En   Git   Pytorch   Region:us   Safetensors   Vision   Visual-question-answering

Git Base Textvqa Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Git Base Textvqa (microsoft/git-base-textvqa)

Git Base Textvqa Parameters and Internals

Model Type 
Transformer, Visual Question Answering
Use Cases 
Areas:
Visual Question Answering, Image and Video Captioning, Image Classification
Applications:
Research, Commercial applications
Primary Use Cases:
Visual question answering on TextVQA dataset
Additional Notes 
The checkpoint described here is 'GIT-base', a smaller variant of the GIT model fine-tuned specifically for TextVQA.
Supported Languages 
English (Proficient)
Training Details 
Data Sources:
COCO, Conceptual Captions (CC3M), SBU, Visual Genome (VG), Conceptual Captions (CC12M), ALT200M, Additional 0.6B image-text pairs
Data Volume:
10 million image-text pairs for GIT-base variant
Methodology:
Teacher forcing
Model Architecture:
Transformer decoder conditioned on CLIP image tokens and text tokens.
LLM NameGit Base Textvqa
Repository ๐Ÿค—https://huggingface.co/microsoft/git-base-textvqa 
Model Namemicrosoft/git-base-textvqa
Model Size177.2m
Required VRAM0.7 GB
Updated2024-10-31
Maintainermicrosoft
Model Typegit
Model Files  0.7 GB   0.7 GB
Supported Languagesen
Model ArchitectureGitForCausalLM
Licensemit
Context Length1024
Model Max Length1024
Tokenizer ClassBertTokenizer
Padding Token[PAD]
Vocabulary Size30522
Torch Data Typefloat32

Quantized Models of the Git Base Textvqa

Model
Likes
Downloads
VRAM
... Base Textvqa Bnb 4bit Smashed0170 GB

Rank the Git Base Textvqa Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 42577 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241227