Model Type | Chat model, Text generation |
|
Use Cases |
Areas: | Chat applications, Creative content generation |
|
Applications: | Commercial applications, Research, Educational tools |
|
Primary Use Cases: | Chatbots, Virtual assistants, Story generation |
|
Limitations: | Potential for hallucination, May produce inconsistent outputs |
|
Considerations: | Adjust generation parameters for desired output qualities. |
|
|
Additional Notes | Models do not directly use Llama's weights; unique datasets and training infrastructure emphasize Yi's independent development. |
|
Supported Languages | English (Fluent), Chinese (Fluent) |
|
Training Details |
Data Sources: | Trainer Multilingual Corpora, 3T Tokens |
|
Data Volume: | |
Methodology: | Transformer-based architecture |
|
Context Length: | |
Training Time: | |
Hardware Used: | NVIDIA A800 (80GB), 4090 GPU |
|
Model Architecture: | Based on Llama's architecture |
|
|
Responsible Ai Considerations |
Fairness: | Addressed during model development. |
|
Transparency: | Standard Transformer architecture; detailed in tech report. |
|
Accountability: | |
Mitigation Strategies: | Use of Supervised Fine-Tuning for better accuracy. |
|
|
Input Output |
Input Format: | Interactive prompt conversation |
|
Accepted Modalities: | |
Output Format: | Text responses or follow-ups |
|
Performance Tips: | Calibrate temperature, top_p, top_k settings for desired response diversity. |
|
|
Release Notes |
Version: | |
Date: | |
Notes: | Initial open-source release of chat model, supporting both 4-bit and 8-bit quantizations. |
|
Version: | |
Date: | |
Notes: | Improved performance in coding, math, and reasoning with larger context capabilities. |
|
|
|