Model Type | |
Use Cases |
Areas: | Research, Commercial applications |
|
Limitations: | Models not tuned to ensure outputs align with human intent and safety considerations. |
|
|
Additional Notes | The models are continually pre-trained and instruction-tuned, emphasizing Japanese language capabilities. |
|
Supported Languages | supported_languages_list (Japanese, English), languages_details (The Swallow model has undergone continual pre-training with the addition of Japanese language data.) |
|
Training Details |
Data Sources: | Japanese Wikipedia, RefinedWeb, Swallow Corpus, The Pile |
|
Methodology: | Supervised fine-tuning (SFT) and instruction tuning using Anthropic HH-RLHF, Databricks Dolly 15-k, and OpenAssistant Conversations Dataset. |
|
Model Architecture: | Refer to LLaMA-2 technical report for details on the model architecture. |
|
|
Input Output |
Accepted Modalities: | |
Output Format: | |
Performance Tips: | Model employs a tokenizer with a broadened vocabulary based on Japanese data, offering efficient text representation and faster inference. |
|
|
Release Notes |
Version: | |
Date: | |
Notes: | Release of Swallow-7b-instruct-v0.1, Swallow-13b-instruct-v0.1, and Swallow-70b-instruct-v0.1. |
|
Version: | |
Date: | |
Notes: | Release of Swallow-7b-plus-hf with twice as many Japanese tokens as Swallow-7b-hf. |
|
Version: | |
Date: | |
Notes: | Release of Swallow-13b-NVE-hf. |
|
Version: | |
Date: | |
Notes: | Release of Swallow-7b-NVE-hf, Swallow-7b-NVE-instruct-hf, Swallow-70b-NVE-hf, Swallow-70b-NVE-instruct-hf. |
|
Version: | |
Date: | |
Notes: | Release of Swallow-7b-hf, Swallow-7b-instruct-hf, Swallow-13b-hf, Swallow-13b-instruct-hf, Swallow-70b-hf, Swallow-70b-instruct-hf. |
|
|
|