How AI and Wikipedia have sent vulnerable languages into a doom spiral

Summary

The article reports that many small-language editions of Wikipedia have been flooded with poor, machine-translated entries. For low‑resource languages such as Greenlandic, Inuktitut, Fulfulde, Igbo and Hawaiian, automated translations often produce grammatical nonsense and factual errors. Because AI models learn from vast amounts of web text, including Wikipedia, this bad content can be absorbed into translation systems and create a reinforcing feedback loop: poor Wikipedia pages produce poor AI translations, which in turn lead to more poor pages.

Volunteers who actually speak these languages are scarce, so error-filled pages frequently go uncorrected. Some communities (Inari Saami is given as a positive example) have successfully curated high-quality content and integrated Wikipedia into language-preservation efforts, but many others face a bleak outlook. The Greenlandic Wikipedia was recently closed after admins found rampant machine-generated nonsense; similar damage threatens other endangered languages used online.

Key Points

Machine translation has enabled non-speakers to bulk-produce articles in small-language Wikipedias, often with serious errors.
Wikipedia can be the primary online corpus for under-resourced languages, so poor pages can poison AI training data.
The resulting feedback loop (‘garbage in, garbage out’) makes automated translation worse for vulnerable languages over time.
Community capacity matters: Inari Saami shows how curated, high-quality content can support revitalisation.
Tools like Wikipedia’s Content Translate rely on external machine translators and are often unsuitable without native-speaker editing.
Errors on Wikipedia can lead to harmful downstream uses, such as misleading language-learning books or tools.
Wikimedia Foundation largely leaves content moderation to local communities; editions without active speakers are particularly vulnerable.

Content summary

The piece mixes reporting and interviews: a Greenlandic Wikipedian who deleted much of his edition after finding pervasive machine-translated nonsense; volunteers from Nigeria, Canada and Hawai‘i describing damage and the labour required to fix it; and experts warning that AI systems train on such data and will amplify errors. It contrasts failing editions with the Inari Saami community’s deliberate, quality-first approach to using Wikipedia as a digital repository for their language.

The article explains technical reasons why some languages are especially at risk (small online corpora, agglutinative structures, close similarity to other tongues) and summarises how Wikimedia’s existing tools and policies struggle to prevent large-scale, automated damage when there are no active native-speaker editors.

Context and relevance

This is a clear intersection of two big trends: increasingly capable but imperfect AI language tools, and the reliance of those tools on freely available online text. For anyone interested in language preservation, AI ethics, digital humanities or platform governance, the story shows how automated systems can unintentionally erode the very cultures they claim to support. It also flags a policy gap: without proactive support for small-language communities, digital platforms risk accelerating language loss.

Author style

Punchy — the reporting is urgent and concrete. The article combines human stories and technical explanation to make the consequences feel immediate: this is not abstract model failure, it is cultural damage unfolding in real time.

Why should I read this?

Because it shows how a neat tech hack — translating content quickly — is actually wrecking fragile language ecosystems. If you care about AI doing less harm (or about saving minority languages), this is the exact kind of real-world failure you need to know about. It’s short on jargon and heavy on examples, so you’ll get the key stakes fast.

Source

Source: https://www.technologyreview.com/2025/09/25/1124005/ai-wikipedia-vulnerable-languages-doom-spiral/