User Feedback on Qwen 2.5 Models: Impressive Performance with Lower Computational Resources

User Feedback on Qwen 2.5 Models

Qwen 2.5, a series of language models released by Alibaba, has garnered significant attention from the AI community. Users report impressive performance across various tasks, often comparing favorably to both open-source and proprietary models. AI enthusiasts consistently report that Qwen models deliver exceptional performance relative to their size, with some noting that the 32B model outperforms larger models like Llama 3.1 70B in various benchmarks. Qwen models run effectively on consumer-grade hardware, including single NVIDIA GPUs like the RTX 3090.

Many users express high satisfaction, comparing Qwen 2.5 favorably to paid models like Claude and ChatGPT. Some report switching from paid services to Qwen 2.5 for various tasks. Users appreciate the balance of performance and accessibility, especially for those with consumer-grade hardware.

Here's a summary of user feedback and experiences with the Qwen 2.5 models:

Coding Capabilities

  • Qwen 2.5 32B: Excellent performance in code generation, debugging, and refactoring across multiple languages (Python, JavaScript, TypeScript, ReactJS).
  • Qwen 2.5 32B: Strong instruction-following for coding tasks.
  • Qwen 2.5 32B: Good at JSON output generation.

Creative Tasks

Impressive results in storytelling, with some preferring Qwen 2.5 72B over GPT-4 for quality (despite slower generation).

Versatility

Performs well in various tasks including summarization, translation, and text-to-SQL conversion.

Some users reported satisfactory performance in specific tasks:

  • Translation (English to Italian) was noted to be better than Google Translate, though not perfect.
  • Text-to-SQL conversion was reported to perform comparably to Llama 3.1. However, the specific Qwen 2.5 model sizes for these tasks were not mentioned in the feedback.

Instruction Following

Particularly good at following precise text manipulation instructions, even in smaller model sizes.

 

Model Variants and Performance

Qwen 2.5 72B

  • Considered comparable to Claude and GPT-4 in many tasks.
  • Provides more comprehensive and detailed responses.
  • Some users report canceling subscriptions to paid services due to its performance 😎.

Qwen 2.5 32B 

  • Strong performance in coding tasks, often replacing the need for ChatGPT.
  • Good balance of speed and capability for many users.
  • Sometimes provides more concise answers compared to larger models.

Qwen 2.5 7B

  • Capable of handling many tasks but may struggle with more complex queries.
  • Useful for users with limited computational resources.

Qwen 2.5 1.5B

  • Surprisingly capable for its size, especially in small code rewrites and syntax reminders.

Technical Details

Quantization

  • Q4_K_S quantization (44GB) achieves about 16.7 tokens/second on dual RTX 3090s.
  • Q4_0 quantization (41GB) reaches approximately 18 tokens/second on the same setup.

Integration

  • Successfully used with llama.cpp, LM Studio API, VSCodium, and continue.dev.
  • Compatible with Intel OpenVINO for CPU optimization.

 

What would you like to improve?

  • Some users report underperformance in non-English languages, particularly German.
  • Handling of sensitive subjects can be inconsistent.
  • Generally slower than some paid models, though quality often compensates.
  • 32B model occasionally responds in Chinese when confused.
Was this helpful?
Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241110