Welcome to MaLA-LM (Massive Language Adaptation of Large Language Models)! 🌍
MaLA-LM focuses on adapting large language models to support hundreds of languages, including many underrepresented ones. Our models are multilingual, scalable, and optimized for diverse linguistic tasks. We work on data construction (e.g., MaLA corpus and PolyWrite), continual pretraining (e.g., EMMA-500, MaLA-500, and MixCPT), instruction fine-tuning (e.g., mono. vs. multilingual Alpaca and Lucky 52) and evaluation (e.g., GlotEval).
Featured 🗣️ Check out our multilingual LLM collections, featuring models trained to handle 500+ languages, ideal for global, multilingual applications.
Dive into the HuggingFace collections: EMMA-500 | MaLA corpus | MaLA-500
Continual Pretraining 📜
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
MaLA-500: Massive Language Adaptation of Large Language Models
GlotEval 🛠️
GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models