Hymba 1.5B Base by nvidia

 ยป  All LLMs  ยป  nvidia  ยป  Hymba 1.5B Base   URL Share it on

  Arxiv:2411.13676   Autotrain compatible   Conversational   Custom code   Hymba   Region:us   Safetensors
Model Card on HF ๐Ÿค—: https://huggingface.co/nvidia/Hymba-1.5B-Base 

Hymba 1.5B Base Benchmarks

nn.n% — How the model compares to the reference models: Anthropic Sonnet 3.5 ("so35"), GPT-4o ("gpt4o") or GPT-4 ("gpt4").
Hymba 1.5B Base (nvidia/Hymba-1.5B-Base)

Hymba 1.5B Base Parameters and Internals

Model Type 
text-to-text
Use Cases 
Limitations:
The model may amplify biases and return toxic responses., The model may generate inaccurate or socially undesirable text., The model is susceptible to jailbreak attacks.
Additional Notes 
Meta tokens, a set of learnable tokens, are prepended to every prompt to improve efficacy. The model shares KV cache between 2 layers and between heads in a single layer. 90% of attention layers are sliding window attention.
Training Details 
Training Time:
September 1, 2024 - November 10, 2024
Model Architecture:
Hymba-1.5B-Base has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
Responsible Ai Considerations 
Accountability:
Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Mitigation Strategies:
Strong output validation controls are recommended to handle security and safety risks.
Input Output 
Performance Tips:
The batch size needs to be 1 during generation due to current implementation limitations.
LLM NameHymba 1.5B Base
Repository ๐Ÿค—https://huggingface.co/nvidia/Hymba-1.5B-Base 
Model Size1.5b
Required VRAM3 GB
Updated2024-12-06
Maintainernvidia
Model Typehymba
Model Files  3.0 GB
Model ArchitectureHymbaForCausalLM
Licenseother
Context Length8192
Model Max Length8192
Transformers Version4.44.0
Tokenizer ClassLlamaTokenizer
Padding Token[PAD]
Vocabulary Size32001
Torch Data Typebfloat16

Best Alternatives to Hymba 1.5B Base

Best Alternatives
Context / RAM
Downloads
Likes
Hymba 1.5B Instruct8K / 3 GB7632186
Note: green Score (e.g. "73.2") means that the model is better than nvidia/Hymba-1.5B-Base.

Rank the Hymba 1.5B Base Capabilities

๐Ÿ†˜ Have you tried this model? Rate its performance. This feedback would greatly assist ML community in identifying the most suitable model for their needs. Your contribution really does make a difference! ๐ŸŒŸ

Instruction Following and Task Automation  
Factuality and Completeness of Knowledge  
Censorship and Alignment  
Data Analysis and Insight Generation  
Text Generation  
Text Summarization and Feature Extraction  
Code Generation  
Multi-Language Support and Translation  

What open-source LLMs or SLMs are you in search of? 38920 in total.

Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241124