LLM Explorer Update: New Ranking System and Features

LLM Explorer Update: New Ranking System and Features

We've just rolled out a major update to the LLM Explorer. This version incorporates user feedback and introduces several new features.

The biggest change is the new LLM Explorer Rank (Score). This is a new smart way to evaluate language models that takes into account many factors:

  • How popular the model is
  • How new it is
  • What experts think of it

LLM Explorer Rank (Score)

The ranking system is fair and balanced. It can compare different types of models, including quantized ones. This helps both researchers and developers get a clear picture of how well a model performs. We created this ranking to solve some problems with older methods:

  • Popular models aren't always the best performers
  • Some benchmarks only looked at a small number of models
  • Many evaluations focused on just one or two skills
  • Quantized models were often left out

You can find more details about the LLM Explorer Rank here

We've also added a new feature: a slider that lets you choose LLMs based on their VRAM requirements. This makes it easier to find models that fit your hardware:

The slider lets you choose LLMs based on their VRAM requirements

We've updated benchmark references (for models with available benchmarks). For these models, we now use Claude 3.5 Sonnet as the primary reference, followed by GPT-4 Turbo and GPT-4:

Updated benchmark references

These changes aim to improve your experience with the LLM Explorer. We welcome your feedback on the new features!

Was this helpful?
Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241110