Sverker Johansson has created more than three million Wikipedia articles, around one tenth of the entire content of the site. How, and why, does he do it?
(First published in N by Norwegian magazine, September 2014. Illustration by Thomas Burden)
Despite his position as the world’s most prolific author, Sverker Johansson hasn’t lost his Swedish sense of self-deprecation. “Well, three million articles is hard to beat but, you know, a lot of them are quite boring. If you read 100 entries about beetles in a row, you’ll probably fall asleep.”
Johansson, the director of research and education at Dalarna University, is the man behind more than three million Wikipedia pages, most in Swedish but also in Cebuano and Waray-Waray Filipino. That means he’s created around 10 per cent of Wikipedia’s roughly 30 million articles, which come in 287 languages, 4.5 million of them in English. He is almost singlehandedly responsible for making Swedish the world’s second most used language on the website, and Waray-Waray the tenth.
Johansson has specialised in cataloguing birds, plants, animals and Filipino towns (his wife is from the Philippines), creating more than 50 pages on Afranthidium bees alone, from Afranthidium abdominale to Afranthidium villosomarginatum via Afranthidium repetitum, which might be the most apt given that the pages don’t exactly make for scintillating reading.
This is partly because Johansson isn’t sitting at a computer churning out articles. He’s created a software programme called a bot, which essentially uses an algorithm to scan online databases and fill in blanks about a particular subject, creating short Wikipedia entries that typically aren’t longer than a few sentences. As Johansson describes it, “It’s like a fill-in-the-blanks form: ‘Blank is a kind of blank discovered by blank in blank.’”
There are hundreds of bots active on Wikipedia, with names like Wizzo-Bot, Cheers!-bot and SmackBot, some of them doing work maintaining the site’s millions of pages – but none is close to being as prolific as Johansson’s Lsjbot, which might not have the catchiest bot name but is by far the most prolific in existence.
Johansson doesn’t make a penny from creating up to 10,000 articles a day, with his bot running constantly and churning out an article every 5-10 seconds (it could produce them quicker, but there are rules on bot creation speeds so that Wikipedia’s servers don’t crash). When we speak, Lsjbot is in the background creating pages about potatoes for Swedish Wikipedia and mosses for Cebuano Wikipedia.
So the question is: why do it? “I’d been using Wikipedia for years when I started looking into how it’s written and why people write. Wikipedia’s vision is to make all human knowledge available to everyone – it’s a vision that resonates with me.”
Johansson is every bit the intellectual polymath, with degrees in particle physics, engineering, economics and linguistics. “From the age of five, I was curious about everything, whether it was the universe or extinct animals. I kept asking ‘Why?’, and I guess I’ve never stopped.”
He’s been in academia his whole life, mostly at smaller universities “because you can be eclectic”. He’s written books about the origins of language, physics and astronomy, and is currently working on a paper about Neanderthal speech (“The evidence is that they did have a language, but we don’t know what it was like”).
“It’s a bit unusual to study the way I do,” he admits. “It doesn’t fit the mould, and it doesn’t help the academic career to publish in such disparate fields.”
It was perhaps inevitable that Wikipedia would appeal at some point, and in 2007 he wrote his first entry, on the origins of language on Swedish Wikipedia. He’d handwritten more than 100 entries on everything from Blondie’s Call Me to writer Kate Mosse when he started considering using a bot in 2011. “I’d seen articles about animals on Dutch Wikipedia that were created with a bot. I thought, I can do that, but better.”
So he created Lsjbot for a small project about birds, after a “long discussion” with the Swedish Wikipedia community beforehand. “There are thousands who contribute now and then, but the core of the community is a few hundred really active creators. Unofficially, if you’re going to do a project like that, it needs to be passed by the community.” He also needed to get the project passed by a wider group called the Bot Approvals Group.
After around two months of work on the software, done on evenings and weekends, Lsjbot’s first major project catalogued 8,000 of the roughly 10,000 species of birds (the rest had been covered), starting with the pied thrush (Zoothera wardii), which is a passerine or perching bird found mostly in India and Sri Lanka. The Lsjbot scoured databases such as the International Union for Conservation of Nature to collect information on the bird species, though Johansson says the first project “didn’t use a very good method”.
He’s since improved the Lsjbot to document everything from authors to lakes and obscure towns. He says the accuracy is “pretty good, though there’s no such thing as 100 per cent accuracy. You’re never going to get a software program that’s completely free of bugs, but the accuracy of the Lsjbot is generally better than handwritten articles.”
As for the general mutterings that bot-created pages are dull and unhelpful, Johansson says, “The point is not to be fun. It gives you specific information that you can then use how you want – the articles don’t pretend to tell the whole story, and I’m not making any claims to be a great creative author.”
Still, there’s a certain amount of power from holding the keys to so much potential information – something Johansson wants to use positively. “There are a lot of imbalances in the information out there,” he says. “Males get more coverage than females on Wikipedia; a village in Europe will get more coverage than a major city in China. You become very aware that most of it is written by young, white male nerds – the wars in Tolkien’s novels are better covered than the war in Vietnam.”
Hence, he plans to redress things – one possible project is authors in Asian countries, though he’s still figuring out how to set the Lsjbot’s parameters. “It could maybe just be authors who’ve published a certain number of books; it could exclude self-published works. I’m not sure yet.”
Either way, he has no plans to stop and is enjoying his new-found recognition after the Wall Street Journal wrote a piece about him in July. “No one had heard of me before that, but now I’m doing interviews every day,” he says. “It’s nice – and it’s good for people to know that I’m more interesting than some of my Wikipedia articles.”