Git Base by microsoft

 ยป  All LLMs  ยป  microsoft  ยป  Git Base   URL Share it on

  Arxiv:2205.14100   Autotrain compatible   En   Endpoints compatible   Git   Image-captioning   Image-to-text   Pytorch   Region:us   Safetensors   Vision
Model Card on HF ๐Ÿค—: https://huggingface.co/microsoft/git-base 

Git Base Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Git Base (microsoft/git-base)

Git Base Parameters and Internals

Model Type 
image-to-text, vision
Use Cases 
Areas:
research, commercial applications
Applications:
image and video captioning, visual question answering, image classification
Primary Use Cases:
image captioning
Additional Notes 
The model is trained using "teacher forcing" method and uses a bidirectional attention mask for image tokens and causal attention mask for text tokens.
Training Details 
Data Sources:
COCO, Conceptual Captions (CC3M), SBU, Visual Genome (VG), Conceptual Captions (CC12M), ALT200M, extra data following Hu et al. (2021a)
Data Volume:
10 million image-text pairs for GIT-base
Methodology:
Teacher forcing on a lot of (image, text) pairs
Model Architecture:
Transformer decoder conditioned on CLIP image tokens and text tokens
LLM NameGit Base
Repository ๐Ÿค—https://huggingface.co/microsoft/git-base 
Model Namemicrosoft/git-base
Model Size176.6m
Required VRAM0.7 GB
Updated2024-10-31
Maintainermicrosoft
Model Typegit
Model Files  0.7 GB   0.7 GB
Supported Languagesen
Model ArchitectureGitForCausalLM
Licensemit
Context Length1024
Model Max Length1024
Tokenizer ClassBertTokenizer
Padding Token[PAD]
Vocabulary Size30522
Torch Data Typefloat32

Best Alternatives to Git Base

Best Alternatives
Context / RAM
Downloads
Likes
Isl Img2text1K / 0.7 GB150
... Git Portuguese Neuro Simbolic1K / 0.7 GB140
Git Base Captioning1K / 0.7 GB390
Git Base Pokemon1K / 0.7 GB190
5 Epochs1K / 0.7 GB140
5e 6 Non Peft1K / 0.7 GB120
Git Base Pokemon1K / 0.7 GB120
...odel Video Caption Finetuned 11K / 0.7 GB141
...del Video Caption Finetuned 111K / 0.7 GB60
Git Video Caption Finetuned 101K / 0.7 GB60
Note: green Score (e.g. "73.2") means that the model is better than microsoft/git-base.

Rank the Git Base Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 40013 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241217