Tackling The Digital Language Divide

Simon Davies

In the first years of the World Wide Web, the first language of the Internet was almost exclusively English. By the mid 1990s, it was estimated that English made up four-fifths of all online content. However, despite once dominating the Web, English now represents just one language in an online linguistic elite.

English’s relative share of cyberspace has shrunk to less than 40% of online content, while Arabic, Portuguese, and Malay have all pushed into the top 10 languages online. Some of these have ballooned at great speed: Chinese, for example, grew by 2,227.9% between 2000 and 2016, and its speakers make up more than a fifth of all Internet users globally.

Still, of the 6,000-7,000 or so languages in use today, it’s estimated only four percent are represented online. It would be nearly impossible to get every single one of those languages online, given that only about half of the world’s languages can be written. However, since the UN declared access to the Internet a basic human right, it has become essential that everyone has the ability to use and understand the Internet. It is clear that being able to access to the Internet is not enough alone to put everyone on an equal digital footing; we must all be able to access that Internet in our own language too.

Why does the language you speak online matter?

As translation expert Global Voices points out, the language we speak shapes our worldview: “Some say language may well shape the way we think, and thus the way we see the world.” Therefore the language you speak likely guides how you consume online content and how you behave in online communities.

“The Web does not just connect machines, it connects people,” said Internet pioneer Tim Berners-Lee, but how far does this connectivity go? On Twitter, although English is the most common language, an estimated half of all tweets are in other languages. Japanese, Portuguese, Spanish, and Indonesian tweeters are the most active. Analysis of their behaviour unsurprisingly shows that users tend to confine their follows, tweets, and retweets to those that speak the same language. While, theoretically, Twitter is a platform for global conversations, the reality of these interactions is more discrete, confined to a relatively small group of people.

Meanwhile, there are fears the Internet may be to blame for the death of languages, with a language driven to extinction every fortnight. Indeed, according to Ethnologue, which catalogues all the world’s known living languages, 1,519 languages are at risk of extinction, with a further 915 said to be dying.

The extent to which the Internet is responsible for this trend is disputed, but after all, languages face high barriers to entry. Businesses only support so many languages when they create online translation tools, spell checkers, and digital software. In his report on Digital Language Death, researcher András Kornai predicts that 95% of all languages in use today will never gain traction online.

Inequality of information

The language you speak affects your experience of the Internet. It even determines how much – if any – information you can access in different languages on Wikipedia. Of the 290 official language editions, English is by some distance the largest edition in terms of users, followed by German and then French. On the other side of the spectrum, there is a near absence of any content in many African and Asian languages. Far from infinite, the Internet is only as big as your language.

Daniel Prado, a researcher on linguistic diversity, commented on the issue of equality and languages online in 2012: “[Google] recognises 30 European languages [but] only one African language and no indigenous American or Pacific languages.” While Google states one of its key goals is to expand the number of languages you can use on its search engine, it is still the case that you can conduct a Google search in just 348 or so different languages.

Widely spoken languages are being neglected online, too. A 2016 report on the presence of Arabic online found that in Egypt, where Arabic is most predominantly spoken, a third of the 50 most visited websites were either not available in Arabic or do not include Arabic as their primary language.

A study by Mark Graham and Matthew Zook, “Augmented Realities and Uneven Geographies: Exploring the Geolinguistic Contours of the Web,” shows the inequality of representation that emerges when you conduct an online search in different languages. They performed a simple “businesses near me” search in the city of Jerusalem as their example, finding that a dominant language search – English – for the term “restaurant” returned the greatest geographical spread. Comparing how the city’s restaurants mapped out if you search in Hebrew or Arabic, a very different picture was painted, with Google sending Arabic speakers to one part of the city and Hebrew speakers to another. It only risks reinforcing social segregation in the city.

The trouble with localisation

“Tweet,” “tuít,” or “giolc” were the three suggestions for a Gaelic iteration of the word “tweet” that Twitter’s Irish translators debated in 2012. The choice between the Anglicized spelling, a Gaelic spelling, or the use of the Gaelic word for “tweeting like a bird” stalled the project for an entire year.

This is the problem with localising websites: rendering the original page content in word-for-word translation isn’t enough – Web developers also need to ensure that the content is relevant, understandable, and in line with the user’s cultural context.

There are inevitably huge challenges, particularly when many smaller languages take only an oral form or have no standardised orthography, but even in cases where translations are readily available, there are further issues associated with the impact on design. The onscreen dimensions of straplines and headings change when an idea is expressed in a second language, while longer paragraphs can vary significantly as language often expands or contracts in translation. Arabic, Hebrew, and Syriac translations require a complete flip from right to left.

Still, website localisation is hugely important for businesses and services that operate online. According to research firm Common Sense Advisory (CSA), Web users are four times more likely to purchase from a company that communicates in their own language. The ability to communicate a brand message online in the mother tongue of the target audience is essential to a company’s success.

Bridging the digital language divide

Translation technologies offer one solution to online language divides, meanwhile opening up new opportunities for global businesses. Although currently only available in a few languages, last year Microsoft launched a Skype translator, and Twitter has paired up with Bing to offer users translation services. Facebook now has a multilingual composer tool that allows users to post in several languages at once.

To an extent, digital technology can also help efforts to rejuvenate languages. In 2014, Facebook added 20 new languages to its site and has launched several more each year since, bringing it to nearly 100 languages. The site also opens up languages for community-based translation. This option is currently available for about 50 languages. There’s a similar tool for Wikipedia, dubbed an “incubator,” to encourage projects in new languages.

“Many speakers are using technology to do really interesting things that were not imaginable a generation back,” says Mark Turin, an anthropologist and linguist at Yale University. A recent proliferation of websites devoted to a single language or languages of a specific region are also uniting speakers in the digital universe. Meanwhile, the Internet is providing multimedia teaching tools, including the Digital Himalayas project, the Arctic Languages Vitality project, and National Geographic’s Enduring Voices Project. The future of the Internet could be limitless as more languages find a life online.

Artificial intelligence that reads and responds to our emotions is the killer app of the digital economy. It will make customers and employees happier – as long as it learns to respect our boundaries.

About Simon Davies

Simon Davies is a London-based freelance writer with an interest in startup culture, issues, and solutions. He works explores new markets and disruptive technologies and communicates those recent developments to a wide, public audience. Simon is also a contributor at socialbarrel.com, socialnomics.net, and tech.co. Follow Simon @simontheodavies on Twitter.