The First AI Community Feedback on Gemma 2: Google’s New Open LLMs

 The First AI Community Feedback on Gemma 2: Google’s New Open LLMs

Last week, Google released Gemma 2, the latest addition to its family of state-of-the-art open LLMs. It comes in two sizes: 9 billion and 27 billion parameters, with both base (pre-trained) and instruction-tuned versions.

Gemma 2 has the same permissive license as the first iteration, allowing redistribution, fine-tuning, commercial use, and derivative works.

As usual, we collected the first feedback from the AI community who tried the models and have already formed their first impressions.

Opinions are split. Some users are not satisfied with the results they got, while others found the new models very impressive. For example, Gemma-2-9B not only matched but sometimes outperformed Llama 3 8B. One user reported that Gemma-2-27B performs well for its size, even replacing Llama-3-70B for many tasks. Additionally, Gemma-2-27B was surprisingly good at multilingual tasks, particularly in Korean. We’ll discuss its language capabilities in more detail later.

Creative Writing

Both sizes of the Gemma 2 model have been highlighted for their creative writing capabilities. AI enthusiasts have shared detailed feedback on their experiences:

Several users reported that Gemma-2-27B surpasses Qwen2-72B in creative writing tasks. They observed that Gemma 2 has a superior understanding of popular fiction, producing more accurate and coherent content, whereas Qwen2 often generates inaccurate or fictional details. This improved accuracy in handling fictional content also leads to enhanced performance in writing and roleplaying scenarios.

Another user suggested that for the first time, Llama-3-70B faces decent competition. Gemma-2-27B seems quite smart and hallucinates less. On aistudio.google.com, it was noted to be better than Llama-70B, handling complex prompts including foreign language, markdown, and JSON, making it the only open LLM that can manage these tasks effectively.

Gemma-2-9B has also impressed users in creative writing. On the creative writing leaderboard, 9B is positioned between Sonnet 3.5 and GPT-4о and this is considered remarkable for a 9B model:

 EQ-Bench

 

Language Capabilities

Gemma 2 models have demonstrated strong multilingual capabilities, impressing users with their performance in a variety of languages, including some less commonly tested ones.

The Gemma-2-27B model excels in multilingual tasks, particularly in non-English languages. For example, it performs exceptionally well in Ukrainian, significantly outshining Llama-3-70B. Its performance in Greek has also been noted as impressive, marking it as the first model to handle the language well.

In practical applications, the 27B model has been used for translating JSON from English to Dutch, yielding mostly accurate results with only a few rare typos. 

Tests in Russian produced coherent results, and there were confirmations of effective performance in Slovenian, indicating that Gemma 2 handles Slavic languages particularly well.

The 9B model has also been praised for its multilingual abilities. For instance, the fine-tuned version of Gemma-2-9B was exceptional in French, surpassing even Llama-3-70B. Both the 9B and 27B models were noted to be very effective in Korean. The 27B model not only produces grammatically correct text but also demonstrates excellent semantic understanding and accurately comprehends user requests. With further tuning and increased context size, the 27B model could become the best open-source model for Korean.

The 9B model performed admirably in Korean tasks, often exceeding expectations for its size.

Overall, Gemma 2 models are proving to be robust in handling a wide range of languages, providing accurate and contextually appropriate translations and text generation.

Was this helpful?
Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v2024072803