Wikipedia's largest non-English version was created by a bot. Generative AI poses new problems

Wikipedia's largest non-English version was created by a bot. Generative AI poses new problems

Wikipedia, the free online encyclopedia made by volunteers, now has nearly 7 million articles in English alone. (Getty Images: zmeel)

by ahmed salah, ceo
April 20, 2025

With nearly 7 million articles, the English-language edition of Wikipedia is by many measures the largest encyclopedia in the world.

The second-largest edition of Wikipedia boasts just over 6 million articles. It isn't French, or Spanish, or Chinese Wikipedia.

It's Cebuano: a language spoken mostly in the southern Philippines.

But Cebuano Wikipedia didn't grow with the help of thousands of volunteer editors, as its English counterpart did. Most of the articles come from one person: Swedish linguist Sverker Johansson.

Dr Johansson designed a program, dubbed "lsjbot", which generated millions of articles in several languages, but particularly Cebuano.

It also laid bare a debate which Wikipedia has been grappling with since its inception, and which artificial intelligence (AI) is making ever more pressing.

How lsjbot 'writes' articles

Programs that automate parts of Wikipedia are nearly as old as the website itself.

These bots crawl the site, doing jobs such as fixing dead links, but many generate articles only a sentence or two long.

Lsjbot creator Sverker Johansson. (Supplied: Lorelie Johansson)

It was these article-producing bots that Dr Johansson encountered in the early 2010s, when he was writing and editing articles himself.

"I started thinking: I can do that. I can do better," he says.

Lsjbot generates articles by taking information from online databases, mostly on biology and geography, and fitting the data into a set number of pre-written sentences.

"The core language model is a few hundred sentence templates, and then the bot will check which information is available," Dr Johansson says.

An article about an animal, for instance, might start with the sentence "The X is a Y that belongs to the Z family", with lsjbot filling in the blanks: lion, mammal, cat.

What drives someone to contribute thousands of pages to Wikipedia?

Photo shows A man sitting next to a laptop with Wikipedia open

Meet the dedicated hobbyists from Western Australia who commit their time to the free online encyclopedia.

While lsjbot could work in any language, most of its output has been in Cebuano. It's so far generated a couple million articles on plants and animals, 4 million articles on geography, and a few articles on smaller categories such as chemical elements.

Dr Johansson chose to focus on Cebuano because it's his wife's native language. She helped him write the sentence templates.

"I'm not really fluent in her language, but I wanted to help anyway, and I figured this is a way I can do it," he says.

He has also run the bot in Waray, another language from the Philippines, and his native Swedish.

The controversy around lsjbot

Lsjbot caused huge ripples through the Philippine Wikipedia community, and not all of them good.

Volunteers who create and maintain Wikipedia, called Wikipedians, found many of the Cebuano-language pages had grammatical and sometimes factual errors, thanks to imperfect translations.

The sheer number of articles was another problem. In a community of few editors, it was difficult to maintain or improve their quality.

In 2018, there was even a proposal to delete the entire Cebuano Wikipedia, including the small fraction of human-generated articles. It was rejected and strongly opposed by a Philippine Wikimedia community.

Irvin Sto. Tomas at the 2025 Wikisource Conference in Indonesia. (Wikimedia Commons: Memora Productions, CC BY-SA 4.0)

Irvin Sto. Tomas, a member of that community, says a small group of local Wikipedians has been trying to improve the quality of the Cebuano pages, including working with Dr Johansson on lsjbot.

"Unfortunately, there is so much to be done that volunteer editors alone cannot do," Mr Tomas, who works with other Philippine-language Wikipedias, says.

Josh Lim, who also edits non-Cebuano Philippine Wikipedias, says automated bots caused reputation problems long before lsjbot.

An early version of Cebuano Wikipedia was composed mostly of thousands of articles on French communes, created by another bot. Articles on topics relevant to Cebuano speakers were sparse.

"You know how embarrassing that is?" Mr Lim says.

He also believes the huge number of bot-generated articles caused a "race to the bottom" among Philippine-language Wikipedias, with editors valuing quantity over quality.

Tagalog, the most widely spoken language in the Philippines, saw its Wikipedia grow in size as part of this race. At its largest, it boasted about 80,000 articles.

It now has half the number of pages it used to. Mr Lim and his fellow editors have been culling short articles for ease of maintenance.

Nevertheless, he believes Dr Johansson's intentions were good.

"I think that what he did was right in the service of Wikipedia," Mr Lim says.

Josh Lim at the 2024 Wikimania conference in Poland. (Wikimedia Commons: Niccolò Caranti, CC BY-SA 4.0)

He also thinks Wikipedians should not reject bots and automation tools outright.

"It speaks to this capacity, or lack thereof, on the part of Wikipedians to assess and embrace change."

The Swedish Wikipedia community, meanwhile, first agreed to, and then pulled back from, lsjbot's use.

Lsjbot has been largely inactive since 2021. Dr Johansson says that the debate around its use was part of his reason for retiring it.

Native languages devalued

Another reason Dr Johansson retired lsjbot was that it wasn't achieving one of the aims he'd hoped it might: bringing a "critical mass" of readers and editors to Cebuano Wikipedia, catalysing a richer encyclopedia.

According to Wikimedia Statistics, Cebuano Wikipedia currently reels in tens of thousands of page views from the Philippines each month.

English Wikipedia, meanwhile, gets more than 100 million Filipino viewers per month.

Mr Lim says this is a broader problem with non-English Wikipedias. The Tagalog Wikipedia, for instance, receives about 2 million local hits per month.

"It's just general colonial experience — our native languages have been devalued," Mr Lim says.

"That, in turn, impacts the amount of information that is available in those languages."

This "devaluing" appears all over the internet. An early version of Google Translate, for instance, translated a number of scientific terms into profanities in Filipino, apparently lacking better data to do it accurately.

But Mr Lim says an "upswell of linguistic pride" is drawing more Filipinos to use their native languages, and he believes sites like Wikipedia can help.

"Wikipedia is part of that solution to allow people to express themselves, and to express complicated thoughts, in their native language."

Cebuano Wikipedia has had other uses — sometimes from unusual corners.

Tags

hello

ahmed salah

ceo

Articles

Discover more articles and stay updated with our latest content.