Google's Gemma 2 2B: Early User Tests Show Efficient Performance on Mobile Devices

02/08/2024 10:56:26

Google's Gemma 2 2B: A Compact AI Model for Mobile Devices

Google's Gemma 2 2B model is demonstrating strong performance on various mobile devices, according to recent user feedback. Here's what early testers are reporting:

On a Motorola g84 smartphone, both Q4 and Q8 quantized versions of the model achieve over 4 tokens per second output while using minimal memory in the Layla frontend. The initial load time is 15-20 seconds for a simple creative writing task. An optimized version for ARM-based devices, developed by ThomasBaruzier, further improves performance to 6.1-5.5 tokens per second and loads in under ten seconds.

The user testing on the Motorola device noted that the model responds well to temperature adjustments and shows a diverse vocabulary. It can handle 8-16k context on phones with 6-8GB RAM, with a slight slowdown for larger contexts. While the model occasionally breaks stories into chapters and shows some logical inconsistencies, these issues appear less frequently compared to other small models.

On an iPhone 15 Pro, another user ran the quantized Gemma 2B efficiently using MLX Swift. They reported performance comparable to GPT 3.5 turbo and Mixtral 8x7B in LMSys.org benchmarks, which is noteworthy for a smartphone-based model. The code and documentation for this implementation are available on GitHub for those interested in replicating or building upon this work.

Looks like Gemma 2 2B is giving smartphones a brain boost! Just don't be surprised if your budget phone starts finishing your sentences or asks for a raise 😁. Remember, with great AI comes great responsibility... and possibly a very confused autocorrect. 🤖📱

Was this helpful?

🌟 Advertise your project 🚀

Recent Blog Posts

The Hidden Dangers of NSFW Language Models: What You Need to Know

2025-02-21
Llama 3.3 70B: Community Experience and Technical Review

2024-12-16
Kudos Qwen Coder Models: Open Weights and Self-Hosted on Your Hardware

2024-11-12
SmolLM2: The Week's Top-Ranked Compact Language Model

2024-11-02
OmniParser and Ferret-UI: New Tools for AI Understanding of User Interfaces

2024-10-30
Aya Expanse 8B: Translation-Focused Language Model

2024-10-27
User Feedback on NVIDIA's Llama 3.1 Nemotron 70B Instruct: Strengths and Limitations

2024-10-18
WhiteRabbitNeo-2.5-Qwen-2.5-Coder-7B: Practical Applications in Cybersecurity

2024-10-03
Meta's Llama 3.2 Restriction Prompts EU AI Regulation Debate

2024-09-29
User Feedback on Qwen 2.5 Models: Impressive Performance with Lower Computational Resources

2024-09-25

Email us: info@extractum.io. Our Privacy Policy | Terms and Conditions | Suggest an improvement.

Our Social Media →

Original data from HuggingFace, OpenCompass and various public git repos.

Release v20241227

Support LLM Explorer