Top-Trending LLMs Over the Last Week. Week #15.

Top-Trending Large Language Models Over the Last Week

Welcome back to our ongoing series where we spotlight the Large and Small  Language Models that are defining the current landscape of artificial intelligence. As of April 9, 2024, we're excited to bring you this week's roundup of the LLMs that have stood out in the AI community. Our list is curated based on the number of downloads and likes these models have garnered on platforms such as Hugging Face and LLM Explorer. 

Dominating the chart is C4AI Command R+ from CohereForAI, achieving nearly 90,000 downloads to date! 🔥 This model has significantly outpaced its competitors, but what exactly sets it apart as the frontrunner? Continue reading to find out.

1. C4ai Command R Plus by CohereForAI is a breakthrough in AI research with 104 billion parameters, offering advanced features like Retrieval Augmented Generation and tool use for complex tasks. It's multilingual, supporting 10 languages, and designed for various uses including reasoning and summarization. Available in non-quantized and quantized versions, it suits AI professionals' needs for advanced projects. The model supports a 128K context length and is accessible for experimentation and integration into diverse applications.

User Feedback

The feedback indicates C4ai Command R Plus stands alongside top models such as Mistral Large, Claude 3, and GPT-4, highlighted by its open-access weights. Initial tests show the model performs well in basic chat interactions. Some feedback indicates the model tends to repeat structures rather than content, which could be beneficial for specific applications like Retrieval Augmented Generation. Its capability in roleplay scenarios, particularly in producing coherent extended roleplay (ERP), sets it apart from existing models. The model's style, influenced by a diverse dataset, along with reliable outputs, suggests it could efficiently handle a range of NLP tasks, enhance dataset quality, and support AI-generated content.

A key feature is the model's embedded Retrieval Augmented Generation (RAG) and its proficiency across multiple languages, making it a strong candidate for multilingual applications. The implementation of Grouped Query Attention (GQA) to reduce memory usage significantly enhances its accessibility and cost-effectiveness for both large-scale and personal use. The model's broad language capabilities and affordability relative to other models increase its appeal. With licensing terms conducive to research and competitive pricing, the model presents a valuable resource for enterprise applications and is accessible to home users with the necessary hardware setup. 

2. Jetmoe 8B by jetmoe. The temptation of testing the JetMoE-8B model is strong, especially with its claim of matching LLaMA2's performance for under $0.1 million! 😀

This model challenges the norm by achieving top-tier results without the hefty financial outlay typically associated with competitive LLMs, even outperforming Meta AI's LLaMA2-7B with far less funding. As an open-source project, JetMoE-8B is especially valuable for academic purposes, not requiring proprietary data or extensive resources for training and allowing fine-tuning on standard GPUs. With 2.2 billion active parameters for efficient computation, it presents a viable, budget-friendly option among similar models.

User Feedback

The feedback acknowledges the model's notable size and features, like its unique sparsified attentions alongside sparsified FFNs and a reduced embedding size of 2048, setting it apart from models like mixtral 8x. It raises a point on the costs of LLM development, highlighting that often only computing costs are discussed, omitting significant investments in research and infrastructure. The comment suggests optimism about the broader accessibility of LLMs as costs decrease, indicating a move towards democratization. There's also a keen interest in models with varied parameter configurations, particularly a model with 4 billion active and 32 billion total parameters. The introduction of smaller, CPU-compatible MoEs models is viewed positively, reflecting a trend towards more accessible, powerful AI tools.

3. Qwen1.5 32B Chat GPTQ Int4 by Qwen, a beta version of the Qwen2 model, introduces enhancements over its predecessor with multiple model sizes from 0.5B to 72B parameters, including a specialized MoE model. It offers improved performance, extended multilingual support, and can handle up to 32K tokens in context length. Utilizing the Transformer architecture with advanced features like SwiGLU activation, the model series is designed for both general and chat-specific tasks. Although it temporarily excludes certain advanced features, Qwen1.5 has undergone extensive pretraining and finetuning to optimize its performance.

User Feedback

The model is identified for its capacity to handle complex tasks effectively, especially in areas of math and physics, demonstrating a more nuanced understanding compared to similar models like yi-34B and mixtral. It's noted for its mathematical comprehension and fluency in processing arithmetic questions, including fractions. However, there are observations of occasional inaccuracies or unexpected outputs, such as random language switches. Users have experimented with the model for various tasks, including complex classification challenges, finding it to offer promising results that align closely with specific needs, particularly in context management for tasks traditionally outside the common LLM application scopes like role-playing or writing assistance.

Practical tests have included coding challenges, such as writing scripts for converting PowerPoint presentations to HTML while maintaining multimedia elements, where the model demonstrated competent performance. Despite these strengths, there are noted areas for improvement, including minimizing hallucinations and enhancing language consistency, with anticipation that future iterations, such as Qwen 2, might address these issues.

4. Gemma 1.1 7B It by Google updates the Gemma model series with improved performance in quality, coding, factuality, and multi-turn conversations, utilizing a new training method. It's designed for text generation in resource-limited settings, supporting tasks like question answering and summarization. This version simplifies running on various hardware with detailed usage instructions and supports both 8-bit and 4-bit quantization for efficiency. It's trained on a diverse 6 trillion token dataset to handle a broad range of topics and languages. 

User Feedback

The AI community's feedback reflects a growing interest in leveraging smaller LLMs for tailored applications, recognizing the benefits of models like Gemma for multilingual support and appropriate content in business contexts. There's a noticeable shift towards finetuning these models for specialized tasks, with Gemma's adaptation for tool use as a prime example. Additionally, there's an interest in developing "expert systems" for specific domains, suggesting a strategic pivot towards AI's targeted use rather than broad, general-purpose applications. However, concerns about the quality and censorship of early model releases are raised, with users questioning the impact of censorship on creative writing and other uses.

5. 4-bit quantized version of C4AI Command R+ by Cohere For AI with 104 billion parameters, enhanced with Retrieval Augmented Generation (RAG) and multi-step tool use capabilities, making it adept at handling complex tasks. It supports ten languages, including English, French, and Japanese, catering to a variety of uses such as reasoning and summarization.

User Feedback

Users are keen on the quantization approach and await further developments for better performance. There's interest in its compatibility with computing setups like VRAM/CUDA and CPU, and its efficiency in NSFW content generation, particularly for role-playing, is noted. However, concerns arise around technical challenges, such as the need for updates to enhance compatibility and the impact of newly applied safety filters on content freedom. 

6. Turkcell-LLM-7b-v1 by TURKCELL. This is a Turkish-specific Large Language Model based on Mistral with 7 billion parameters. It was developed to understand and generate Turkish language content more accurately. The model was trained on a dataset comprising 5 billion Turkish tokens and fine-tuned with custom Turkish instruction sets using DORA and LORA methods for improved performance. It supports extensive Turkish tokenizer capabilities, making it suitable for a wide range of natural language processing tasks in Turkish. Usage involves loading the model and tokenizer from the TURKCELL repository, with examples provided for initiating conversational tasks. This model represents a significant step towards enhancing Turkish language processing applications.

7.  Bielik-7B-Instruct-v0.1 by speakleash. The Bielik-7B-Instruct-v0.1 model, a collaborative effort between SpeakLeash and ACK Cyfronet AGH, is a Polish language processing model fine-tuned on specially curated Polish text corpora. Developed on Poland's large-scale computing infrastructure and the Helios supercomputer, this model aims for higher linguistic task performance in Polish. It introduces innovations like weighted token level loss and adaptive learning rate for training efficiency. Quantized versions are available for different computing needs. The model has shown promising evaluation results, especially in the RAG Reader task, but carries limitations such as potential incorrect outputs and is released under a CC BY NC 4.0 license for non-commercial use.

8.  Jambatypus-v0.1 by  Maxime Labonne, an AI and machine learning (ML) researcher and great AI contributor, is a language model fine-tuned on the Open-Platypus-Chat dataset using QLoRA, building on ai21labs' Jamba-v0.1. Training utilized two A100 80 GB GPUs, employing a LazyAxolotl - Jamba notebook. This release includes an FP16 precision adapter and merged model, recommended for use with the ChatML template. Training involved a multi-GPU setup with hyperparameters focused on learning rate, batch sizes, optimizer settings, and a cosine learning rate scheduler across a single epoch. Results showed progressive reduction in training and validation loss. The model and usage instructions, including for a Gradio chat interface, are detailed, emphasizing a lightweight integration process for users.

9. Qwen1.5-32B by Qwen, a beta version of the Qwen2 model, brings several improvements including eight sizes from 0.5B to 72B parameters, significant chat model enhancements, and multilingual support. It's designed with updated architecture features like SwiGLU activation and attention mechanisms, handling up to 32K context length for diverse applications. Although not fully featuring GQA and mixed attention in this beta, Qwen1.5 is adaptable for further development. It integrates seamlessly with the latest Hugging Face transformers, offering a solid base for customization and application-specific fine-tuning.

10. Qwen1.5 32B Chat by Qwen not only inherits the advancements of the Qwen1.5-32B model but places a specific emphasis on "Significant performance improvement in human preference for chat models." This version details its training process, including supervised fine-tuning and direct preference optimization, suggesting a deliberate focus on enhancing user satisfaction in chat functionalities.

Keep an eye out for our update next week, where we'll continue to spotlight the leading LLMs making an impact.

We encourage you to contribute your insights and evaluations of these models on our platform, aiding the community in navigating the fast-evolving landscape of LLMs.

See you next week!

 ← Previous Week's Top LLMs       Next Week's Top LLMs →    

Was this helpful?
Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v2024042801