Model Type | |
Use Cases |
Areas: | research, academic, non-commercial |
|
Primary Use Cases: | multi-image tasks, image reasoning, temporal understanding |
|
Limitations: | Not suitable for commercial use |
|
|
Additional Notes | Achieves state-of-the-art multi-image skills including co-reference, reasoning, comparing, temporal understanding. |
|
Supported Languages | |
Training Details |
Data Sources: | TIGER-Lab/Mantis-Instruct |
|
Training Time: | |
Hardware Used: | |
Model Architecture: | Interleaved text and image as inputs with multi-image capabilities |
|
|
Input Output |
Input Format: | Interleaved text and images |
|
|