Open Source LLMs in the Context of Translation

Open Source LLMs in the Context of Translation

A week ago, the Intento team (a machine translation and multilingual generative AI platform for global enterprise companies) published its 8th annual State of Machine Translation report. The work analyzes 52 MT engines and LLMs across 11 language pairs and 9 content domains. The MT systems and LLMs covered in this report were accessed between March 25 and May 14, 2024. What is interesting is the performance of open-source LLMs in the domain of translation.

Firstly, open-source LLMs such as TowerInstruct, RakutenAI 7B, Neurotõlge, Aya-101, Command R, and Mixtral 8x7B show promising capabilities. TowerInstruct models by Unbabel, based on Llama-2, are designed for translation tasks and perform well in multilingual settings. RakutenAI 7B excels in Japanese and English translation, while Neurotõlge supports both high-resource and low-resource languages, particularly Finno-Ugric languages. Aya-101 by Cohere supports 101 languages, focusing on lower-resourced ones, and Command R excels in reasoning and multilingual generation. Mixtral 8x7B by Mistral AI is noted for high-quality performance in various languages. However, these models generally fall into the second tier compared to commercial engines due to more limited multilingual capabilities.

Several Open Source Models Deliver Impressive Results

Despite their potential, open-source LLMs face significant challenges. They are generally 10-100 times less expensive than traditional machine translation (MT) systems but are also 50-1000 times slower, impacting their suitability for real-time applications. Customization through fine-tuning, prompt engineering, and the use of translation memories can enhance their performance.

Open-source models like TowerInstruct 7B and Command R approach top-tier commercial engine performance but often struggle with complex translations, particularly in languages like Arabic.

Overall, while open-source LLMs are cost-effective and show impressive results in certain contexts, they still lag behind commercial models in multilingual capabilities and real-time translation performance.

Was this helpful?
Our Social Media →  
Original data from HuggingFace, OpenCompass and various public git repos.
Release v20241110