Archive for the 'diversity' Category

Disease of Familiarity, the Flaw of Wikipedia

Originally written as an answer to the question What are some major flaws in Wikipedia? on Quora. Republished here with some changes.

Wikipedia has a whole lot of flaws, and its basic meta-flaw is the disease of familiarity.

It does not mean what you think it means. The disease of familiarity is knowing so much about something that you don’t understand what it is like to not understand it.

I recognized this phenomenon in 2011 or so, and called it The Software Localization Paradox. I later realized that it has a lot of other aspects beyond software localization, so I thought a lot about it and struggled for years with giving it a name. I learned about the term “disease of familiarity” from Richard Saul Wurman, best known as the creator of the TED conference (see a note about it at the end of this post). Some other names for this phenomenon are “curse of knowledge” and “mind blindness”. See also Is there a name for “knowing so much about something that you don’t understand what is it like not to know it”?

Unfortunately, none of these terms is very famous, and their meaning is not obvious without some explanation. What’s even worse, the phenomenon is in general hard to explain because of its very nature. But I’ll try to give a few examples.


Wikipedia doesn’t make it easy for people to understand its jargon.

Wikipedia calls itself “The Free encyclopedia”; what does it mean that it’s “free”? I wrote Wikipedia:The Free Encyclopedia, one of the essays on this topic (there are others), but it’s not official or authoritative, and more importantly, the fact that this essay exists doesn’t mean that everybody who starts writing for Wikipedia reads it and understands the ideology behind it, and its implications. An important implication of this ideology is that according to the ideology of the Free Culture movement, of which Wikipedia is a part, is that some images and pieces of text can be copied from other sites into Wikipedia, and some cannot. The main reason for this is copyright law. People often copy text or images that are not compatible with the policies, and since this is heavily enforced by experienced Wikipedia editors, this causes misunderstandings. Wikipedia’s interface could communicate these policies better, but experienced Wikipedians, who already know them, rarely think about this problem. Disease of familiarity.

Wikipedia calls itself “a wiki”. A lot of people think that it’s just a meaningless catchy brand name, like “Kodak”. Some others think that it refers to the markup language in which the site is written. Yet others think that it’s an acronym that means “what I know is”. None of these interpretations is correct. The actual meaning of “wiki” is “a website that anyone can edit”. The people who are experienced with editing Wikipedia know this, and assume that everybody else does, but the truth is that a lot of new people don’t understand it and are afraid of editing pages that others had written, or freak out when somebody edits what they had written. Disease of familiarity.

The most common, built-in way for communication between the different Wikipedians is the talk page. Only Wikipedia and other sites that use the MediaWiki software use the term “talk page”. Other sites call such a thing “forum”, “comments”, or “discussion”. (To make things more confusing, Wikipedia itself occasionally calls it “discussion”.) Furthermore, talk pages, which started on Wikipedia in 2001, before commenting systems like Disqus, phpBB, Facebook, or Reddit were common, work in a very weird way: you need to manually indent each of your posts, you need to manually sign your name, and you need to use a lot of obscure markup and templates (“what are templates?!”, every new user must wonder). Experienced editors are so accustomed to doing this that they assume that everybody knows this. Disease of familiarity.

A lot of pages in Wikipedia in English and in many other languages have infoboxes. For example, in articles about cities and towns there’s an infobox that shows a photo, the name of the mayor, the population, etc. When you’re writing an article about your town, you’ll want to insert an infobox. Which button do you use to do this? There’s no “Infobox” button, and even if there were, you wouldn’t know that you need to look for it because “Infobox” is a word in Wikipedia’s internal jargon. What you actually have to do is Insert → Template → type “Infobox settlement”, and fill a form. Every step here is non-intuitive, especially the part where you have to type the template’s name. Where are you supposed to know it from? Also, these steps are how it works on the English Wikipedia, and in other languages it works differently. Disease of familiarity.

And this brings us to the next big topic: Language.

You see, when I talk about Wikipedia, I talk about Wikipedia in all languages at once. Otherwise, I talk about the English Wikipedia, the Japanese Wikipedia, the Arabic Wikipedia, and so on. Most people are not like me: when they talk about Wikipedia, they talk about the one in the language in which they read most often. Quite often it’s not their first language; for example, a whole lot of people read the Wikipedia in English even though English is their second language and they don’t even know that there is a Wikipedia in their own language. When these people say “Wikipedia” they actually mean “the English Wikipedia”.

There’s nothing bad in it by itself. It’s usually natural to read in a language that you know best and not to care very much about other languages.

But here’s where it gets complicated: Technically, there are editions of Wikipedia in about 300 languages. This number is pretty meaningless, however: There are about 7,000 languages in the world, so not the whole world is covered, and only in 100 languages or so there is a Wikipedia in which there is actually some continuous writing activity. In the other 200 the activity is only sporadic, or there is no activity at all—somebody just started writing something in that language, and a domain was created, but then the first people who started it lost interest and nobody else came to continue their work.

This is pretty sad because it’s frequently forgotten that a whole lot of people cannot read what they want in Wikipedia because they don’t know a language in which there is an article about what they want to learn. If you are reading this post, you have the privilege of knowing English, and it’s hard for you to imagine how does a person who doesn’t know English feel. Disease of familiarity: You think you can tell everybody “if you want to know something, read about it in Wikipedia”, but you cannot actually tell this to most people because most people don’t know English.

The missed opportunity becomes even more horrific when you realize that the people who would have the most appropriate skills for breaking out of this paradox are the people who are least likely to notice it, and the people who are hurt by it the most are the least capable of fixing it themselves. Think about it:

  • If you know, for example, Russian and English, and you need to read about a topic on which there is an article in the English Wikipedia, but not in Russian, you can read the English Wikipedia, and it’s possible that you won’t even notice that an article in Russian doesn’t exist. Unless you exercise mindfulness about the issue, you won’t empathize with people who don’t know English. To break out of this cycle, one can practice the following:
    • Always look for articles in Russian first.
    • Dedicate some time every week to translating articles. (See How does Wikipedia handle page translation?)
    • When you talk to people in your language, don’t assume that they know English.
  • A person who doesn’t know English is just stuck without an article, and there’s not much to do. It’s possible that you don’t even know that the article you need exists in another language. And maybe you cannot even read the user manual that teaches you how to edit. What can you do?
    • Try to be bold and ask your friends who do know English to translate it for you and publish the translation for the benefit of all the people who speak your language.
    • (Of course, there’s the solution of learning English, but we can’t assume that it works. Evidently, there are billions of people who don’t know English, and they won’t all learn English any time soon.)

(In case it isn’t clear, you can replace “English” and “Russian” in the example above with any other pair of languages.)

It’s particularly painful in countries where English, French, or Portuguese is the dominant language of government and education, even though a lot of the people, often the majority, don’t actually know it. This is true for many countries in Africa, as well as for Philippines, and to a certain extent also in India and Pakistan.

People who know English have a very useful aid for their school studies in the form of Wikipedia. People who don’t know English are left behind: the teachers don’t have Wikipedia to get help with planning the lessons and the students don’t have Wikipedia to get help with homework. The people who know English and study in English-medium schools have these things and don’t even notice how the other people—often their friends!—are left behind. Disease of familiarity.

Finally, most of the people who write in the 70 or so most successful Wikipedias don’t quite realize that the reason the Wikipedia in their language is successful is that before they had a Wikipedia, they had had another printed or digital encyclopedia, possibly more than one; and they had public libraries, and schools, and universities, and all those other things, which allowed them to imagine quite easily how would a free encyclopedia look like. A lot of languages have never had these things, and a Wikipedia would be the first major collection of educational materials in them. This would be pretty awesome, but this develops very slowly. People who write in the successful Wikipedia projects don’t realize that they just had to take the same concepts they already knew well and rebuild them in cyberspace, without having to jump through any conceptual epistemological hoops.

Disease of familiarity.


It’s hard to explain this.

I unfortunately suspect that very few, if any, people will understand this boring, long, and conceptually difficult post. If you disagree, please comment. If you think that you understand what I’m trying to say, but you have a simpler or shorter way to say it, please comment or suggest an edit (and tell your friends). If you have more examples of the disease of familiarity in Wikipedia and elsewhere, please speak up.

Thank you.


(As promised above, a note about Richard Saul Wurman. I heard him introduce the “disease of familiarity” concept in an interview with Debbie Millman on her podcast Design Matters, at about 23 minutes in. That interview was one of this podcast’s weirdest episodes: you can clearly hear that he’s making Millman uncomfortable, and she also mentioned it on Twitter. This, in turn, makes me uncomfortable to discuss something I learned from that interview, but I am just unable to find any better terminology for the phenomenon in question. If you have suggestions, please send them my way.)


Disclaimer: I’m a contractor working with the Wikimedia Foundation, but this post, as well as all my other posts on the topic of Wikimedia, Wikipedia, and related projects, are my own opinions and do not represent the Wikimedia Foundation.

Advertisements

Twitter Must Make it Easy to Mass-Report Spam Bots

I found a network of Russian female bots. Twitter spam bots.

They are not actually female. They just have Russian female names and female photos.

Most of those that I found were created in September 2016, although some were created at other times.

They all have similar taglines:

  • “In my opinion, everything is wonderful. I wonder what else” (“По-моему всё прекрасно. Интересно что ещё”)
  • “Right now absolutely everything is excellent. I wonder how else” (“Сейчас вообще всё отлично. Интересно как там ещё”)
  • “It looks like absolutely everything is wonderful. I’ll see what will happen next” (“Вроде вообще всё прекрасно. Посмотрю что будет дальше”)

… And so forth, with minor variations, which are very easy to detect for a human who knows Russian, although I’m less sure about software. (This reminds me of how I was interviewed for several natural language processing positions around 2011. All of them were about optimizing site text for Google ads, and all of them specifically targeted only English. When you only target English, other languages are used to spam you.)

Their usernames are all almost random and end with two digits: flowoghub90, viotrondo86, chirowsga88 (although “90” seem to be the most frequent digits). As location, they all indicate one of the large cities of Russia: Moscow, Krasnoyarsk, Perm, Saint-Petersburg, Rostov-on-Don, etc.

All of them post nothing but retweets of other accounts popular in Russia:

Curiously, all their names are only typical to ethnic Russians. Names of real women from Russia would be much more varied—there would be a lot of typical Armenian, Ukrainian, Jewish, Georgian, and Tatar names that reflect Russia’s diversity: Melikyan, Petrenko, Rivkind, Gamkrelidze, Khamitova. But these spam bot accounts only have names such as Kuznetsova, Romanova, Ershova, Medvedeva, Kiseleva. If you aren’t familiar with the Russian culture, let me make a comparison to the U.S.: It’s like having a lot of people named Smith, Harris, Anderson, and Roberts, and nobody named Gonzalez, Khan, O’Connor, Rosenberg, or Kim. Maybe the spammers wanted to be more mainstream than mainstream, and maybe it is just overt racism.

I found them when I noticed that a lot of unfamiliar accounts with Russian female names were retweeting something by Pavel Durov in which I was mentioned. Durov is the founder of VK and Telegram, and I guess that he can be classified under “major internet businesses” in the list above. I noticed the similar taglines of the “women”, and immediately understood they are all spam bots.

These accounts are active. Some of them retweeted stuff while I was writing this post. I also keep getting retweet notifications, more than two weeks after Durov’s original tweet was posted.

When I am looking at any of these accounts, Twitter suggests me similar ones, and they are all in the same network: Russian female names, similar “everything is wonderful” taglines, similar content. So Twitter’s software understands that they are similar, but doesn’t understand that they are spam bots that should be utterly banned. I also noticed that some of them are still suggested to me after I blocked them, which goes against the whole point of blocking.

I don’t know how many there are of them in this network. Likely thousands. I reported thirty or so, and I wonder whether it’s efficient for anything.

I also don’t know what is their purpose. Boost the popularity of other Russian accounts? But those that they retweet are popular already. Waste the time of people who try to use Twitter productively? Maybe; at least it’s the effect in my case. Function as bot followers in “pay to follow” networks? Possibly, but they have existed for a year, and they don’t follow so many people.

I’m probably not discovering anything very new in this post. But especially if I don’t, it all the more makes me wonder why isn’t this problem already addressed somehow. At the very least it should be possible to report them more efficiently with one click or tap. And Twitter should also provide a form for mass-reporting; currently, Twitter’s guides about spam only suggest this: “The most effective way to report spam is to go directly to the offending account profile, click the drop-down menu in the upper right corner, and select “report account as spam” from the list.” It’s OK for one account, but it requires five clicks, and it doesn’t scale for something as systematic as what I am describing in this post.

I do hope that somebody from Twitter will read this and do something about it. This is obvious systematic abuse, and I have no better way to report it.


Archives

Advertisements