MaLA-500

MaLA-500: Massive Language Adaptation of Large Language Models

¹Center for Information and Language Processing, LMU Munich ²Munich Center for Machine Learning
³University of Helsinki ⁴Instituto Superior Técnico (Lisbon ELLIS Unit) ⁵Instituto de Telecomunicações ⁶Unbabel
linpq@cis.lmu.de, shaoxiong.ji@helsinki.fi
^*Indicates Equal Contribution

Abstract

Large language models (LLMs) have advanced the state of the art in natural language processing. However, their predominant design for English or a limited set of languages creates a substantial gap in their effectiveness for low-resource languages. To bridge this gap, we introduce MaLA-500, a novel large language model designed to cover an extensive range of 534 languages. To train MaLA-500, we employ vocabulary extension and continued pretraining on LLaMA 2 with Glot500-c. Our intrinsic evaluation demonstrates that MaLA-500 is better at predicting the given texts of low-resource languages than existing multilingual LLMs. Moreover, the extrinsic evaluation of in-context learning shows that MaLA-500 outperforms previous LLMs on SIB200 and Taxi1500 by a significant margin, i.e., 11.68% and 4.82% marco-average accuracy across languages.

@article{lin2024mala, title={MaLA-500: Massive Language Adaptation of Large Language Models}, author={Lin, Peiqin and Ji, Shaoxiong and Tiedemann, J{\"o}rg and Martins, Andr{\'e} FT and Sch{\"u}tze, Hinrich}, journal={arXiv preprint arXiv:2401.13303}, year={2024} }

MaLA-500: Massive Language Adaptation of Large Language Models

Abstract

BibTeX