Model Type | Joint Attention, Mamba, Generative Text Model, Dense Model, Mixture-of-Experts |
|
Use Cases |
Areas: | Research, Commercial applications |
|
Applications: | Fine-tuning for chat/instruct versions |
|
Primary Use Cases: | Foundation layer for training and developing custom solutions |
|
Limitations: | Did not undergo any alignment for instruct/chat interactions |
|
Considerations: | Guardrails should be added for responsible and safe use. |
|
|
Additional Notes | Jamba is the first production-scale Mamba implementation. It is a state-of-the-art, hybrid SSM-Transformer LLM. |
|
Training Details |
Methodology: | Joint Attention and Mamba |
|
Context Length: | |
Model Architecture: | Hybrid SSM-Transformer LLM |
|
|
Responsible Ai Considerations |
Mitigation Strategies: | Does not have safety moderation mechanisms and guardrails. |
|
|
Input Output |
Input Format: | Text prompts should include the 'BOS' token for evaluation. |
|
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Model can be loaded in BF16/FP16 using `torch_dtype` for better performance. Use `attn_implementation` for FlashAttention2. Quantization with bitsandbytes is supported to fit larger sequences. |
|
|
Release Notes |
Version: | |
Notes: | Dense version without Mixture-of-Experts. Extracts weights of the first expert. |
|
|
|