Model Type | text-to-code, text-to-text, code completion, code generation, instruction following, chat |
|
Use Cases |
Areas: | Research, Development, Education |
|
Applications: | Code Completion, Code Generation, Code Conversation, Code Education |
|
Primary Use Cases: | IDE code completion, Code chat applications, Interactive learning |
|
Limitations: | Training data limitations, Inherent limitations of LLMs |
|
|
Additional Notes | Model implements rigorous safety filtering. |
|
Supported Languages | Primary Language (English), Code Languages (C++, C#, Go, Java, JavaScript, Kotlin, Python, Rust) |
|
Training Details |
Data Sources: | Public code repositories, Open source mathematics datasets, Synthetically generated code |
|
Data Volume: | 500 to 1000 billion tokens |
|
Methodology: | |
Hardware Used: | |
|
Safety Evaluation |
Methodologies: | structured evaluations, internal red-teaming |
|
Risk Categories: | cyber-offence capabilities, representational harms, large-scale harms |
|
|
Input Output |
Input Format: | Natural language text or prompt, code prefix/suffix |
|
Accepted Modalities: | |
Output Format: | Code and natural language |
|
|