Yi VL 34B By 01-ai: Benchmarks, Features and Detailed Analysis. Insights on Yi VL 34B.

Arxiv:2403.04652 Conversational Image-text-to-text Llava Pytorch Region:us Sharded

Model Card on HF 🤗: https://huggingface.co/01-ai/Yi-VL-34B

Yi VL 34B Benchmarks

^nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").

What is the LLM Explorer Rank (Score)

Yi VL 34B Parameters and Internals

Model Type

Multimodal, Vision-Language

Use Cases

Areas:

Research, Commercial applications

Applications:

Visual question answering, Image comprehension

Primary Use Cases:

Multi-round text-image conversations

Limitations:

Supports text-image conversations but not image-to-video., May hallucinate content not present in images., Resolution limited to 448x448.

Considerations:

Evaluate potential risks before adopting.

Additional Notes

Supports image understanding at 448×448 resolution; ongoing limitations may affect certain applications.

Supported Languages

English (proficient), Chinese (proficient)

Training Details

Data Sources:

LAION-400M, CLLaVA, Flickr, VQAv2, RefCOCO, Visual7w, GQA, VizWiz VQA, TextCaps, OCR-VQA, Visual Genome, LAION GPT4V

Data Volume:

N/A

Methodology:

Three-stage training: image and text alignment using ViT and Yi LLM with datasets like LAION-400M and Visual Genome.

Training Time:

10 days for Yi-VL-34B, 3 days for Yi-VL-6B

Hardware Used:

128 NVIDIA A800 (80G) GPUs

Model Architecture:

Vision Transformer (ViT) initialized with CLIP ViT-H/14, a projection module, and Yi LLMs.

Input Output

Input Format:

Text and images

Accepted Modalities:

text, image

Output Format:

Text

LLM Name	Yi VL 34B
Repository 🤗	https://huggingface.co/01-ai/Yi-VL-34B
Model Size	34b
Required VRAM	70.3 GB
Updated	2025-02-05
Maintainer	01-ai
Model Type	llava
Model Files	10.0 GB: 1-of-8 9.9 GB: 2-of-8 9.8 GB: 3-of-8 9.8 GB: 4-of-8 9.8 GB: 5-of-8 9.9 GB: 6-of-8 10.0 GB: 7-of-8 1.1 GB: 8-of-8
Model Architecture	LlavaLlamaForCausalLM
License	apache-2.0
Context Length	4096
Model Max Length	4096
Transformers Version	4.34.0
Tokenizer Class	LlamaTokenizer
Padding Token	<unk>
Vocabulary Size	64000
Torch Data Type	bfloat16

Best Alternatives to Yi VL 34B

Best Alternatives	Context / RAM	Downloads	Likes
Llava V1.6 34B	4K / 69.9 GB	13096	343
HuatuoGPT Vision 34B	4K / 69.9 GB	562	17
LLaVA NeXT Video 34B	4K / 69.9 GB	57	16
LLaVA NeXT Video 34B DPO	4K / 69.9 GB	28	10
Llava V1.6 34B Finetune	4K / 69.9 GB	9	4
Hf Llava V1.6 34B	4K / 69.9 GB	11	6

Note: green Score (e.g. "73.2") means that the model is better than 01-ai/Yi-VL-34B.

Rank the Yi VL 34B Capabilities

🆘 Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! 🌟

Instruction Following and Task Automation
Factuality and Completeness of Knowledge
Censorship and Alignment
Data Analysis and Insight Generation
Text Generation
Text Summarization and Feature Extraction
Code Generation
Multi-Language Support and Translation

What open-source LLMs or SLMs are you in search of? 42577 in total.

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241227

Support LLM Explorer

Yi VL 34B by 01-ai

» All LLMs » 01-ai » Yi VL 34B URL Share it on

Yi VL 34B Benchmarks

Yi VL 34B Parameters and Internals

Best Alternatives to Yi VL 34B

Rank the Yi VL 34B Capabilities

What open-source LLMs or SLMs are you in search of? 42577 in total.