UCT MzansiLM: AI Model for 11 SA Languages

4d ago·0:00 listen·Source: IOL

Summary

Researchers at the University of Cape Town have developed a new artificial intelligence language model called MzansiLM. This model is specifically designed to recognize and understand all 11 official written languages of South Africa. What's interesting is that this initiative aims to bridge the digital divide, as many South African languages have been underrepresented in AI. The team, led by Anri Lombard and Dr. Jan Buys, will showcase their work at a conference in Spain this month. They created MzansiText, a dataset reflecting South Africa's linguistic diversity, and MzansiLM, a language model trained on it. This is important because many South African languages are considered "low resource" due to limited available text for AI training. Nine out of the 11 official languages fall into this category. MzansiLM is the first publicly available model to support all 11 official written languages within a single framework. Its goal is inclusivity, especially for often-neglected languages. Even with a modest 125 million parameters, MzansiLM has shown strong performance, even outperforming some larger models in benchmarks for multiple South African languages. This matters because it brings AI tools closer to all South Africans, regardless of their language.

Read the full article on IOL

This is an AI-generated audio summary. Always check the original source for complete reporting.

Share
Keep Listening