Model Type | text-to-text, text-to-code |
|
Use Cases |
Areas: | research, commercial applications |
|
Applications: | code completion, code generation, code conversation, code education |
|
Primary Use Cases: | interactive code learning, syntax correction, coding practice |
|
Limitations: | Large Language Models (LLMs) have limitations based on their training data. |
|
|
Additional Notes | TPU hardware was used for training. The model focused on fitness in real-world applications with structured examples using heuristic techniques. |
|
Supported Languages | |
Training Details |
Data Sources: | publicly available code repositories, open source mathematics datasets, synthetically generated code |
|
Data Volume: | 500 to 1000 billion tokens |
|
Methodology: | |
Hardware Used: | |
|
Safety Evaluation |
Methodologies: | red-teaming, structured evaluations |
|
Findings: | within acceptable thresholds for meeting internal policies |
|
Risk Categories: | content safety, representational harms, child safety |
|
|
Responsible Ai Considerations |
Fairness: | Evaluated with structured evaluations and internal red-teaming |
|
Accountability: | |
|
Input Output |
Input Format: | code prefix and/or suffix, or natural language text or prompt |
|
Output Format: | fill-in-the-middle code completion, code and natural language |
|
Performance Tips: | Provide a list of terminators to the `generate` function to ensure generation stops at the first delimiter. |
|
|
Release Notes |
Version: | |
Date: | |
Notes: | Performance metrics and comparisons provided. |
|
|
|