Model Type | text-to-text, text-to-code, decoder-only |
|
Use Cases |
Areas: | code completion, code generation, code conversation, code education |
|
Applications: | IDE integration, Conversational interfaces |
|
Primary Use Cases: | Code fragments question answering, Natural language to code generation |
|
Limitations: | |
Considerations: | Users should follow safety and ethical guidelines outlined by Google AI. |
|
|
Additional Notes | CodeGemma models build on the successful Gemma model with specialized training for text-to-code tasks. |
|
Supported Languages | |
Training Details |
Data Sources: | publicly available code repositories, open source mathematics datasets, synthetically generated code |
|
Data Volume: | 500 to 1000 billion tokens |
|
Methodology: | Further trained variants including FIM (Fill-In-The-Middle) techniques |
|
Hardware Used: | |
Model Architecture: | CodeGemma models are built on top of Gemma |
|
|
Safety Evaluation |
Methodologies: | human evaluation, cyber-offense testing, red-teaming |
|
Findings: | Within acceptable thresholds for meeting internal policies |
|
Risk Categories: | child safety, content safety, representational harms, memorization, large-scale harms |
|
Ethical Considerations: | Evaluation of ethical considerations in line with Google AI Principles |
|
|
Responsible Ai Considerations |
Mitigation Strategies: | Implemented rigorous safety filtering and evaluation processes. |
|
|
Input Output |
Input Format: | code prefix/suffix or natural language text |
|
Accepted Modalities: | |
Output Format: | code and natural language text |
|
|