Wednesday, 30 August 2017 11:43

Here's Why Google's Algorithms Can't Outsmart 'Toxic' People

By: 

Still, growing frustration with rude, and even phony, online posting begs for some mechanism to filter out rubbish. So, rather than employ costly humans to monitor online discussion, we try to do it with software.

Software does some things fabulously well, but interpreting language isn’t usually one of them.

I’ve never noticed any dramatic difference in attitudes or civility between the people of Vermont and New Hampshire, yet the latest tech news claims that Vermont is America’s top source of “toxic” online comments, while its next-door neighbor New Hampshire is dead last.

Reports also claim that the humble Chicago suburb of Park Forest is trolls’ paradise.

After decades living in the Chicago Metropolitan area, I say without hesitation that the people of Park Forest don’t stand out from the crowd, for trolling or anything else. I don’t know whether they wish to stand out or not, but it’s my observation that folks from Park Forest just blend in. People may joke about Cicero and Berwyn, but not Park Forest.

So what’s going on? Software.

Perspective, a tool intended to identify “toxic” online comments, is one of the Jigsaw projects, Google experiments aimed at promoting greater safety online. Users feed it comments, and Perspective returns a 0-100 score for the percent of respondents likely to find the comment “toxic,” that is, likely to make them leave the conversation.

It was released months ago, but has drawn a blast of new publicity in the past few days since Wired used it for development of “Trolls Across America,” an article featuring an online map highlighting supposed trolling hotspots across the country.

Interpreting language is one of the most complex and subtle things that people do. The meaning of human communication is based in much more than the dictionary meaning of words. Tone of voice, situation, personal history and many other layers of context have roles to play.

The same remark may hold different significance for each person who hears it. Even one person may view a statement differently at different moments. Human language just does not lend itself to the kinds of strict rules of interpretation that are used by computers.

As soon as Perspective (which is clearly labeled as a research project) was announced, prospective users were warned about its limitations. Automated moderation was not recommended, for example. One suggested use was helping human moderators decide what to review.

David Auerbach, writing for MIT’s Technology Review, soon pointed out that “It’s Easy to Slip Toxic Language Past Alphabet’s Toxic-Comment Detector. Machine-learning algorithms are no match for the creativity of human insults.” He tested an assortment of phrases, getting results like these:

  • “‘Trump sucks’ scored a colossal 96 percent, yet neo-Nazi codeword ‘14/88’ only scored 5 percent.” [I also tested “14/88” and got no results at all. In fact, I tested all of the phrases mentioned by Auerbach and got somewhat different results, though the patterns were all similar.]
  • “Jews are human,” 72. “Jews are not human,” 64.
  • “The Holocaust never happened,” 21.

Twitter’s all atwitter with additional tests results from machine learning researchers and other curious people. Here is a sample of the phrases that were mentioned, in increasing order of toxicity scores from Perspective:

  1. I love the Führer, 8
  2. I am a man, 20
  3. I am a woman, 41
  4. You are a man, 52
  5. Algorithms are likely to reproduce human gender and racial biases, 56
  6. I am a Jew, 74
  7. You are a woman, 79

Linguistically speaking, most of these statements are just facts. If I’m a woman, I’m a woman. If you’re a man, you’re a man. If we interpret such statements as something more than neutral facts, we may be reading too much into them. “I love the Führer” is something else entirely.  To look at these scores, though, you’d get a very different impression.

The problem is, the scoring mechanism can’t be any better than the rules behind it.

Nobody at Google set out to make a rule that assigned a low toxicity score to “I love the Führer” or a high score to “I am a Jew.” The rules were created in large part through automation, presenting a crowd of people with sample comments and collecting opinions on those comments, then assigning scores to new comments based on similarity to the example comments and corresponding ratings.

This approach has limitations. The crowd of people are not without biases, and those will be reflected in the scores. And terminology not included in the sample data will create gaps in results.

A couple of years ago, I heard a police trainer tell a group of officers that removing one just word from their vocabulary could prevent 80% of police misconduct complaints filed by the public. The officers had no difficulty guessing the word. It’s deeply embedded in police jargon, and has been for so long that it got its own chapter in the 1978 law enforcement book Policing: A View from the Street.

Yet the same word credited for abundant complaints of police misconduct has appeared in at least 3 articles here on Forbes in the past month (123.), and not drawn so much as a comment.

Often, it’s not the words that offend, but the venom behind them. And that’s hard, if not impossible, to capture in an algorithm.

This isn’t to say that technology can’t do some worthwhile things with human language.

Text analytics algorithms, rules used by software to convert open-ended text into more conventional types of data, such as categories or numeric scores, can be useful. They lie at the heart of online search technology, for example, helping us find documents to topics of interest. Some other applications include:

  • e-discovery, which increases productivity for legal teams reviewing large quantities of documents for litigation
  • Warranty claim investigation, where text analysis helps manufacturers to identify product flaws early and enable corrective action
  • Targeted advertising, which uses text from content that users read or create to present relevant ads

It takes more than a dictionary to understanding the meaning of language. Context, on the page and off, is all important.

People recognize the connections between the things that people write or say, and the unspoken parts of the story. Software doesn’t do that so well.

Meta S. Brown is author of Data Mining for Dummies and creator of the Storytelling for Data Analysts and Storytelling for Tech workshops. http://www.metabrown.com.

Source: This article was published forbes.com

Leave a comment

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media

Book Your Seat for Webinar GET FREE REGISTRATION FOR MEMBERS ONLY      Register Now