Archive for the 'Twitter' Category

Twitter Must Make it Easy to Mass-Report Spam Bots

I found a network of Russian female bots. Twitter spam bots.

They are not actually female. They just have Russian female names and female photos.

Most of those that I found were created in September 2016, although some were created at other times.

They all have similar taglines:

  • “In my opinion, everything is wonderful. I wonder what else” (“По-моему всё прекрасно. Интересно что ещё”)
  • “Right now absolutely everything is excellent. I wonder how else” (“Сейчас вообще всё отлично. Интересно как там ещё”)
  • “It looks like absolutely everything is wonderful. I’ll see what will happen next” (“Вроде вообще всё прекрасно. Посмотрю что будет дальше”)

… And so forth, with minor variations, which are very easy to detect for a human who knows Russian, although I’m less sure about software. (This reminds me of how I was interviewed for several natural language processing positions around 2011. All of them were about optimizing site text for Google ads, and all of them specifically targeted only English. When you only target English, other languages are used to spam you.)

Their usernames are all almost random and end with two digits: flowoghub90, viotrondo86, chirowsga88 (although “90” seem to be the most frequent digits). As location, they all indicate one of the large cities of Russia: Moscow, Krasnoyarsk, Perm, Saint-Petersburg, Rostov-on-Don, etc.

All of them post nothing but retweets of other accounts popular in Russia:

Curiously, all their names are only typical to ethnic Russians. Names of real women from Russia would be much more varied—there would be a lot of typical Armenian, Ukrainian, Jewish, Georgian, and Tatar names that reflect Russia’s diversity: Melikyan, Petrenko, Rivkind, Georgadze, Khamitova. But these spam bot accounts only have names such as Kuznetsova, Romanova, Ershova, Medvedeva, Kiseleva. If you aren’t familiar with the Russian culture, let me make a comparison to the U.S.: It’s like having a lot of people named Smith, Harris, Anderson, and Roberts, and nobody named Gonzalez, Khan, O’Connor, Rosenberg, or Kim. Maybe the spammers wanted to be more mainstream than mainstream, and maybe it is just overt racism.

I found them when I noticed that a lot of unfamiliar accounts with Russian female names were retweeting something by Pavel Durov in which I was mentioned. Durov is the founder of VK and Telegram, and I guess that he can be classified under “major internet businesses” in the list above. I noticed the similar taglines of the “women”, and immediately understood they are all spam bots.

These accounts are active. Some of them retweeted stuff while I was writing this post. I also keep getting retweet notifications, more than two weeks after Durov’s original tweet was posted.

When I am looking at any of these accounts, Twitter suggests me similar ones, and they are all in the same network: Russian female names, similar “everything is wonderful” taglines, similar content. So Twitter’s software understands that they are similar, but doesn’t understand that they are spam bots that should be utterly banned. I also noticed that some of them are still suggested to me after I blocked them, which goes against the whole point of blocking.

I don’t know how many there are of them in this network. Likely thousands. I reported thirty or so, and I wonder whether it’s efficient for anything.

I also don’t know what is their purpose. Boost the popularity of other Russian accounts? But those that they retweet are popular already. Waste the time of people who try to use Twitter productively? Maybe; at least it’s the effect in my case. Function as bot followers in “pay to follow” networks? Possibly, but they have existed for a year, and they don’t follow so many people.

I’m probably not discovering anything very new in this post. But especially if I don’t, it all the more makes me wonder why isn’t this problem already addressed somehow. At the very least it should be possible to report them more efficiently with one click or tap. And Twitter should also provide a form for mass-reporting; currently, Twitter’s guides about spam only suggest this: “The most effective way to report spam is to go directly to the offending account profile, click the drop-down menu in the upper right corner, and select “report account as spam” from the list.” It’s OK for one account, but it requires five clicks, and it doesn’t scale for something as systematic as what I am describing in this post.

I do hope that somebody from Twitter will read this and do something about it. This is obvious systematic abuse, and I have no better way to report it.

Advertisements

The Curious Problem of Belarusian and Igbo in Twitter and Bing Translation

Twitter sometimes offers machine translation for tweets that are not written in the language that I chose in my preferences. Usually I have Hebrew chosen, but for writing this post I temporarily switched to English.

Here’s an example where it works pretty well. I see a tweet written in French, and a little “Translate from French” link:

Emmanuel Macron on Twitter.png

The translation is not perfect English, but it’s good enough; I never expect machine translation to have perfect grammar, vocabulary, and word order.

Now, out of curiosity I happen to follow a lot of people and organizations who tweet in the Belarusian language. It’s the official language of the country of Belarus, and it’s very closely related to Russian and Ukrainian. All three languages have similar grammar and share a lot of basic vocabulary, and all are written in the Cyrillic alphabet. However, the actual spelling rules are very different in each of them, and they use slightly different variants of Cyrillic: only Russian uses the letter ⟨ъ⟩; only Belarusian uses ⟨ў⟩; only Ukrainian uses ⟨є⟩.

Despite this, Bing gets totally confused when it sees tweets in the Belarusian language. Here’s an example form the Euroradio account:

Еўрарадыё   euroradio    Twitter double.pngBoth tweets are written in Belarusian. Both of them have the letter ⟨ў⟩, which is used only in Belarusian, and never in Ukrainian and Russian. The letter ⟨ў⟩ is also used in Uzbek, but Uzbek never uses the letter ⟨і⟩. If a text uses both ⟨ў⟩ and ⟨і⟩, you can be certain that it’s written in Belarusian.

And yet, Twitter’s machine translation suggests to translate the top tweet from Ukrainian, and the bottom one from Russian!

An even stranger thing happens when you actually try to translate it:

Еўрарадыё   euroradio    Twitter single Russian.pngNotice two weird things here:

  1. After clicking, “Ukrainian” turned into “Russian”!
  2. Since the text is actually written in Belarusian, trying to translate it as if it was Russian is futile. The actual output is mostly a transliteration of the Belarusian text, and it’s completely useless. You can notice how the letter ⟨ў⟩ cannot be transliterated.

Something similar happens with the Igbo language, spoken by more than 20 million people in Nigeria and other places in Western Africa:

 4  Tweets with replies by Ntụ Agbasa   blossomozurumba    Twitter.png

This is written in Igbo by Blossom Ozurumba, a Nigerian Wikipedia editor, whom I have the pleasure of knowing in real life. Twitter identifies this as Vietnamese—a language of South-East Asia.

The reason for this might be that both Vietnamese and Igbo happen to be written in the Latin alphabet with addition of diacritical marks, one of the most common of which is the dot below, such as in the words ibụọla in this Igbo tweet, and the word chọn lọc in Vietnamese. However, other than this incidental and superficial similarity, the languages are completely unrelated. Identifying that a text is written in a certain language only by this feature is really not great.

If I paste the text of the tweet, “Nwoke ọma, ibụọla chi?”, into translate.bing.com, it is auto-identified as Italian, probably because it includes the word chi, and word that is written identically happens to be very common in Italian. Of course, Bing fails to translate everything else in the Tweet, but this does show a curious thing: Even though the same translation engine is used on both sites, the language of the same text is identified differently.

How could this be resolved?

Neither Belarusian nor Igbo languages are supported by Bing. If Bing is the only machine translation engine that Twitter can use, it would be better to just skip it completely and not to offer any translation, than to offer this strange and meaningless thing. Of course, Bing could start supporting Belarusian; it has a smaller online presence than Russian and Ukrainian, but their grammar is so similar, that it shouldn’t be that hard. But what to do until that happens?

In Wikipedia’s Content Translation, we don’t give exclusivity to any machine translation backend, and we provide whatever we can, legally and technically. At the moment we have Apertium, Yandex, and YouDao, in languages that support them, and we may connect to more machine translation services in the future. In theory, Twitter could do the same and use another machine translation service that does support the Belarusian language, such as Yandex, Google, or Apertium, which started supporting Belarusian recently. This may be more a matter of legal and business decisions than a matter of engineering.

Another thing for Twitter to try is to let users specify in which languages do they write. Currently, Twitter’s preferences only allow selecting one language, and that is the language in which Twitter’s own user interface will appear. It could also let the user say explicitly in which languages do they write. This would make language identification easier for machine translation engines. It would also make some business sense, because it would be useful for researchers and marketers. Of course, it must not be mandatory, because people may want to avoid providing too much identifying information.

If Twitter or Bing Translation were free software projects with a public bug tracking system, I’d post this as a bug report. Given that they aren’t, I can only hope that somebody from Twitter or Microsoft will read it and fix these issues some day. Machine translation can be useful, and in fact Bing often surprises me with the quality of its translation, but it has silly bugs, too.


Archives