Top-Trending LLMs Over the Last Week. Week #17.
23/04/2024 10:01:59
Last week, the AI community was buzzing with the launch of Llama 3, available in 8B and 70B sizes. These models have outperformed many open-source chat models on standard benchmarks.
We also joined in on the LLM excitement 😊 with our post "Llama3 License Explained," which covers the license's permissiveness and the restrictions on using the models.
As expected, the Llama family models took the top spots in the trending list this past week. (Just a reminder, all models are ranked based on how many times they were downloaded and liked, according to data from Hugging Face and LLM Explorer).
Meta Llama 3 8B and Meta Llama 3 70B
Llama 3 is offered in both pre-trained and instruction-tuned variants. The Llama 3 models input only text and are designed to generate text and code. It uses an auto-regressive language model framework based on an optimized transformer architecture. The instruction-tuned versions incorporate supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to enhance usefulness and safety in line with human preferences.
For training, Llama 3 was pre-trained on over 15 trillion tokens of publicly available data, complemented by more than 10 million human-annotated examples used for fine-tuning. Importantly, neither the pretraining nor the fine-tuning datasets include Meta user data.
The 70B model has been well-received for its speed and interaction quality, outperforming other large models tested by users. Some users have noted censorship, which, while typical, may lead others to prefer models with less censorship. Overall, initial impressions are favorable, and users are encouraged by the model's performance in their specific interactions.
Mixtral 8x22B Instruct v0.1 by mistralai
License: apache-2.0
The Mixtral 8x22B Instruct v0.1 is a fine-tuned variant of the Mixtral-8x22B model, optimized for instruction-based tasks. It integrates special tokens for function calling and utilizes the MistralTokenizer for encoding chat completions. This model runs on CUDA devices using the AutoModelForCausalLM from the transformers library to generate responses. Its tokenizer supports structured input queries, enhancing its application in environments that require precise and context-aware interactions. This model is especially suitable for AI and ML professionals needing to execute specific instructions efficiently.
WizardLM-2 8x22B by alpindale
License: apache-2.0
WizardLM-2 is a new series of state-of-the-art large language models developed by WizardLM@Microsoft AI. This series includes three models: WizardLM-2 8x22B, 70B, and 7B. The 8x22B model is the most advanced, offering highly competitive performance and outperforming existing opensource models.
These models are multilingual and utilize a Mixture of Experts (MoE) approach based on the Mixtral-8x22B-v0.1 base model. They have been evaluated using the MT-Bench framework, demonstrating top-tier performance across various model sizes. Detailed evaluations show that the WizardLM-2 models perform competitively with other leading proprietary and opensource models.
User Feedback
There was a discussion on Reddit where the user shared their experience comparing the following models: WizardLM-2-8x22b, Llama-3-70b-instruct, Command-R+, and Mixtral-8x22b-instruct. The user conducted a detailed comparison using custom benchmarks focused on inferential thinking, knowledge questions, and high school level mathematics. The evaluation aimed to assess the models beyond standard benchmarks, which the user believes might have been compromised due to their widespread use in training datasets.
The results indicated that WizardLM-2-8x22b was the most effective for the user's specific needs, excelling in knowledge-based questions and complex problem-solving in inferential thinking and mathematics. Llama-3-70b-instruct also performed well but was slightly less effective than WizardLM-2-8x22b across all tested areas.
Stay tuned for next week's update, where we'll continue to highlight the top Large Language Models making a difference!
Recent Blog Posts
-
2024-08-03