Source: This article was published - Contributed by Member: Corey Parker

Ben-Gurion University of the Negev and University of Washington researchers have developed a new generic method to detect fake accounts on most types of social networks, including Facebook and Twitter.

According to their new study in Social Network Analysis and Mining, the new method is based on the assumption that fake accounts tend to establish improbable links to other users in the networks.

“With recent disturbing news about failures to safeguard user privacy, and targeted use of social media by Russia to influence elections, rooting out fake users has never been of greater importance,” explains Dima Kagan, lead researcher and a researcher in the BGU Department of Software and Information Systems Engineering.

“We tested our algorithm on simulated and real-world datasets on 10 different social networks and it performed well on both.”

The algorithm consists of two main iterations based on machine-learning algorithms. The first constructs a link prediction classifier that can estimate, with high accuracy, the probability of a link existing between two users.

The second iteration generates a new set of meta-features based on the features created by the link prediction classifier. Lastly, the researchers used these meta-features and constructed a generic classifier that can detect fake profiles in a variety of online social networks.

Here’s a helpful video explanation of how it all works:

“Overall, the results demonstrated that in a real-life friendship scenario we can detect people who have the strongest friendship ties as well as malicious users, even on Twitter,” the researchers say. “Our method outperforms other anomaly detection methods and we believe that it has considerable potential for a wide range of applications particularly in the cyber-security arena.”

Other researchers who contributed are Dr. Michael Fire of the University of Washington (former Ben-Gurion U. doctoral student) and Prof. Yuval Elovici, director of [email protected] and a member of the BGU Department of Software and Information Systems Engineering.

The Ben-Gurion University researchers previously developed the Social Privacy Protector (SPP) Facebook app to help users evaluate their friend's list in seconds to identify which have few or no mutual links and might be “fake” profiles.

Categorized in Social

A new book shows how Google’s search algorithms quietly reinforce racist stereotypes.

Are search engines making us more racist?

According to Safiya Umoja Noble, a professor of communication at the University of Southern California, the answer is almost certainly yes.

Noble’s new book, Algorithms of Oppression: How Search Engines Reinforce Racism, challenges the idea that search engines like Google provide a level playing field for all ideas, values, and identities. She says they’re inherently discriminatory and favor the groups that designed them, as well as the companies that fund them.

This isn’t a trivial topic, especially in a world where people get more information from search engines than they do from teachers or libraries. For Noble, Google is not just telling people what they want to know but also determining what’s worth knowing in the first place.

I reached out to Noble last week to find out what she had learned about the unseen factors driving these algorithms, and what the consequences of ignoring them might be.

A lightly edited transcript of our conversation follows.

Sean Illing

What are you arguing in this book?

Safiya Umoja Noble

I’m arguing that large, multinational advertising platforms online are not trusted, credible public information portals. Most people think of Google and search engines in particular as a public library, or as a trusted place where they can get accurate information about the world. I provide a lot of examples to show the incredible influence of advertising dollars on the kinds of things we find, and I show how certain people and communities get misrepresented in service of making money on these platforms.

Sean Illing

Who gets misrepresented and how?

Safiya Umoja Noble

I started the book several years ago by doing collective searches on keywords around different community identities. I did searches on “black girls,” “Asian girls,” and “Latina girls” online and found that pornography was the primary way they were represented on the first page of search results. That doesn’t seem to be a very fair or credible representation of women of color in the United States. It reduces them to sexualized objects.

So that begs the question: What’s going on in these search engines? What are the well-funded, well-capitalized industries behind them who are purchasing keywords and using their influence to represent people and ideas in this way? The book was my attempt to answer these questions.

Sean Illing

Okay, so at the time you did this research, if someone went to Google and searched for “black women,” they would get a bunch of pornography. What happens if they type in “white girls” or “white women”? Or if they search for what should be a universal category, like “beautiful people”?

Safiya Umoja Noble

Now, fortunately, Google has responded to this. They suppressed a lot of porn, in part because we’ve been speaking out about this for six or seven years. But if you go to Google today and search for “Asian girls” or “Latina girls,” you’ll still find the hypersexualized content.

For a long time, if you did an image search on the word “beautiful,” you would get scantily clad images of almost exclusively white women in bikinis or lingerie. The representations were overwhelmingly white women.

People often ask what happens when you search “white girls.” White women don’t typically identify as white; they just think of themselves as girls or women or individuals. I think what you see there is the gaze of people of color looking at white women and girls and naming whiteness as an identity, which is something that you don’t typically see white women doing themselves.

Sean Illing

These search algorithms aren’t merely selecting what information we’re exposed to; they’re cementing assumptions about what information is worth knowing in the first place. That might be the most insidious part of this.

Safiya Umoja Noble

There is a dominant male, a Western-centric point of view that gets encoded into the organization of information. You have to remember that an algorithm is just an automated decision tree. If these keywords are present, then a variety of assumptions have to be made about what to point to in all the trillions of pages that exist on the web.

And those decisions always correlate to the relationship of advertisers to the platform. Google has a huge empire called AdWords, and people bid in a real-time auction to optimize their content.

That model — of information going to the highest bidder — will always privilege people who have the most resources. And that means that people who don’t have a lot of resources, like children, will never be able to fully control the ways in which they’re represented, given the logic and mechanisms of how search engines work.

Sean Illing

In the book, you talk about how racist websites gamed search engines to control the narrative around Martin Luther King Jr. so that if you searched for MLK, you’d find links to white supremacist propaganda. You also talk about the stakes involved here and point to Dylann Roof as an example.

Safiya Umoja Noble

In his manifesto, Dylann Roof has a diatribe against people of color, and he says that the first event that truly awakened him was the Trayvon Martin story. He says he went to Google and did a search on “black-on-white crime.” Now, most of us know that black-on-white crime is not an American epidemic — that, in fact, most crime happens within a community. But that’s a separate discussion.

So Roof goes to Google and puts in a white nationalist red herring (“black-on-white crime.”) And of course, it immediately takes him to white supremacist websites, which in turn take him down a racist rabbit hole of conspiracy and misinformation. Often, these racist websites are designed to appear credible and benign, in part because that helps them game the algorithms, but also because it convinces a lot of people that the information is truthful.

This is how Roof gets radicalized. He says he learns about the “true history of America,” and about the “race problem” and the “Jewish problem.” He learns that everything he’s ever been taught in school is a lie. And then he says, in his own words, that this makes him research more and more, which we can only imagine is online, and this leads to his “racial awareness.”

And now we know that shortly thereafter, he steps into the “Mother” Emanuel AME Church in Charleston, South Carolina, and murders nine African-American worshippers in cold blood, in order to start a race war.

So the ideas that people are encountering online really matter. It matters that Dylann Roof didn’t see the FBI statistics that tell the truth about how crime works in America. It matters that he didn’t get any counterpoints. It matters that people like him are pushed in these directions without resistance or context.

Sean Illing

My guess is that these algorithms weren’t designed to produce this effect, but I honestly don’t know. What is driving the decision-making process? Is this purely about commercial interests?

Safiya Umoja Noble

It’s difficult to know exactly what Google’s priorities are, because Google’s search algorithm is proprietary, so no one can really make sense of the algorithm except by looking at the output. All of us who study this do it by looking at the end results, and then we try to reverse-engineer it as best we can.

But yes, it’s pretty clear that what’s ultimately driving tech companies like Google is profit. I don’t imagine that a bunch of racists is sitting around a table at Google thinking of ways to create a racist product, but what happens, however, is that engineers simply don’t think about the social consequences of their work. They’re designing technologies for society, and they know nothing about society.

In its own marketing materials, Google says there are over 200 different factors that go into deciding what type of content they surface. I’m sure they have their own measures of relevance for what they think people want. Of course, they’re also using predictive technologies, like autosuggestion, where they fill in the blank. They’re doing that based on what other people have looked at or clicked on in the past.

Sean Illing

But the autosuggestion tool guarantees that majority perspectives will be consistently privileged over others, right?

Safiya Umoja Noble

Right. People who are a numerical minority in society will never be able to use this kind of “majority rules” logic to their benefit. The majority will always be able to control the notions of what’s important, or what’s important to click on, and that’s not how the information landscape ought to work.

Sean Illing

I’m sure some people will counter and say that these are essentially neutral platforms, and if they’re biased, they’re biased because of the human users that make them up. In other words, the problem isn’t the platform; it’s the people.

Safiya Umoja Noble

The platform exists because it’s made by people. It didn’t come down from an alien spacecraft. It’s made by human beings, and the people who make it are biased, and they code their biases into search. How can these things not inform their judgment?

So it’s disingenuous to suggest that the platform just exists unto itself and that the only people who can manipulate it or influence it are the people who use it when actually, the makers of the platform are the primary source of responsibility. I would say that there are makers, as well as users, of a platform. They have to take responsibility for their creations.

Source: This article was published By Sean Illing

Categorized in Search Engine

Google has confirmed rumors that a search algorithm update took place on Monday. Some sites may have seen their rankings improve, while others may have seen negative or zero change.

Google has posted on Twitter that it released a “broad core algorithm update” this past Monday. Google said it “routinely” does updates “throughout the year” and referenced the communication from the previous core update.

Google explained that core search updates happen “several times per year” and that while “some sites may note drops or gains,” there is nothing specific a site can do to tweak its rankings around these updates. In general, Google says to continue to improve your overall site quality, and the next time Google runs these updates, hopefully, your website will be rewarded.

Google explained that “pages that were previously under-rewarded” would see a benefit from these core updates.

Here is the statement Google previously made about this type of update:

Each day, Google usually releases one or more changes designed to improve our results. Some are focused around specific improvements. Some are broad changes. Last week, we released a broad core algorithm update. We do these routinely several times per year.

As with any update, some sites may note drops or gains. There’s nothing wrong with pages that may now perform less well. Instead, it’s that changes to our systems are benefiting pages that were previously under-rewarded.

There’s no “fix” for pages that may perform less well, other than to remain focused on building great content. Over time, it may be that your content may rise relative to other pages.

Here is Google’s confirmation from today about the update on Monday:

Screenshot 4


Source: This article was published By Barry Schwartz

Categorized in Search Engine

Google is the dominating force in the world of search engines, and there’s an entire industry dedicated to maximizing visibility within its search engine results: search engine optimization (SEO).

People like me have built their careers on finding ways to benefit from the central ranking algorithm at Google’s core. But here’s the interesting thing: Google doesn’t explicitly publish how its search algorithm works and often uses vague language when describing its updates.

So how much do we really know about Google’s ranking algorithm? And why is Google so secretive about it?

Why Google Keeps Things Secret

Google has come under fire lately, most recently by German Chancellor Angela Merkel, because it keeps its algorithm secret. Her main argument is that transparency is vitally important to maintaining a balanced society; after all, our daily searches shape our behavior in subtle and blatant ways, and not knowing the mechanisms that influence that behavior can leave us in the dark.

But Google isn’t withholding its algorithm so that it can manipulate people with reckless abandon. There are two good reasons why the company would want to keep the information a closely-guarded secret.

First, Google’s algorithm is proprietary, and it has become the dominant search competitor because of its sheer sophistication. If other competitors have free and open access to the inner workings of that algorithm, they could easily introduce a competing platform with comparable power, and Google’s search share could unfairly plummet.

Second, there are already millions of people who make a living by improving their positions within Google, and many of them are willing to use ethically questionable tactics or spam people in an effort to get more search visibility. If Google fully publishes its search algorithm, they could easily find bigger loopholes, and ruin the relatively fair search engine results pages (SERPs) we’ve come to expect from the giant.

How We Learn

So if Google withholds all the information on its algorithm, how can search optimizers know how to improve the search rankings of web pages?

  • Google revelations. Google doesn’t leave webmasters totally in the dark. While it refuses to disclose specifics about how the algorithm functions, it’s pretty open about the general intentions of the algorithm, and what webmasters can take away from it. For example, Google has published and regularly updates a guidelines manual on search quality ratings; 160 pages long, and last updated July of last year, it’s a fairly comprehensive guidebook that explains general concepts of how Google judges the quality of a given page. Google has also been known to explain its updates as they roll out—especially the larger ones—with a short summary and a list of action items for webmasters. These are all incredibly helpful sources of information.
  • Direct research. Google doesn’t give us everything, however. If you scroll through Moz’s fairly comprehensive guide on the history of Google’s algorithm changes, you’ll notice dozens of small updates that Google didn’t formally announce, and in many cases, refuses to acknowledge. How does the search community know that these algorithm changes unfolded? We have volatility indicators like MozCast, which measure how much the SERPs are changing within a given period of time; a period of high volatility is usually the signature of some kind of algorithm change. We can also conduct experiments, such as using two different tactics on two different pages and seeing which one ranks higher at the end of the experiment period. And because the SEO community is pretty open about sharing this information, one experiment is all it takes to give the whole community more experience and knowledge.
  • Experience and intuition. Finally, after several years of making changes and tracking growth patterns, you can rely a bit on your own experience and intuition. When search traffic plummets, you can usually identify a handful of potential red flags and come up with ideas for tweaks to take you back to your baseline.

What Do We Know?

So what do we really know about Google’s search algorithm?

  • The basics. We know the basic concept behind the search platform: to give users the best possible results for their queries. Google does this by presenting results that offer a combination of relevance (how appropriate the topic is) and authority (how trustworthy the source is).
  • Core ranking factors. We also know the core ranking factors that will influence your rank. Some of these come directly from Google’s webmaster guidelines, and some of them come from the results of major experiments. In any case, we have a good idea what changes are necessary to earn a high rank, and what factors could stand in your way. I covered 101 of them here.
  • Algorithm extensions and updates.We also know when there’s a new Google update, thanks to the volatility indicator, and we can almost always form a reasonable conclusion on the update’s purpose—even when Google doesn’t tell us directly.

While we still don’t know the specifics of how Google’s algorithm works—and unless the EU’s transparency campaign kicks into high gear soon, we probably won’t for the foreseeable future—we do know enough about it to make meaningful changes to our sites, and maximize our ranking potential.

Moreover, the general philosophy behind the algorithm and the basic strategies needed to take advantage of it aren’t hard to learn. If you’re willing to read Google’s documentation and learn from the experiments of others, you can get up to speed in a matter of weeks.

 Source: This article was published By Jayson DeMers,

Categorized in Search Engine


  • Google Search finds quality of newsy content algorithmically
  • Search results to omit fake news through improved ranking signals
  • India marks 2x growth in daily active search users on Google

Google Search already receive some artificial intelligence (AI) tweaks to enhance user experience. But with the swift growth of inferior-quality content, Google is now in the process of improving the quality of its search results. VP of Engineering Shashidhar Thakur on the sidelines of Google for India 2017 on Tuesday stated that Google is making continuous efforts to cut down on the amount of fake news content listed on its search engine.

"Whether it's in India or internationally, we make sure that we uphold a high bar when it comes to the quality of newsy content. Generally, in search, we find this type of content algorithmically," Thakur told Gadgets 360. The algorithms deployed behind Google Search look for the authoritativeness of the content and its quality to rank them appropriately. Thakur said that this continuous improvement will uplift the quality of the search results over time.

"We improve ranking signals on our search engine from time to time to overcome the issue of fake news. Signals help the system understand a query or the language of the query or the text or matching different keywords to provide relevant results," explained Thakur.

Similar to other search engines that use code-based bots to crawl different webpages, Google Search indexes hundreds of billions of webpages consistently. Once indexed, Google Search adds webpages to different entries that include all the words available on those pages. This data is then processed to the Knowledge Graph that not just looks for any particular keywords but also picks user interests to give relevant results.


"Inferior-quality content on the Web isn't a new and special problem," Thakur said. "But certainly, it is a problem that we need to solve by continuous tuning and making the underlying search algorithms better. This is indeed a very crucial area of focus for us."

Google isn't the only Web company that is taking the menace of fake news seriously. Facebook and Microsoft's Bing are also testing new developments to curb fake news. A recent report by Gartner predicted that fake news will grow multifold by 2022 and people in mature economies will consume more amount of false information over the information that is true and fair.

Having said that, Google is dominating the Web space and its search engine is the most prominent area for counterfeit content. Thakur at the Google for India stage revealed the number of daily active search users in India has grown two times in the last one year. The Mountain View, California-headquartered company also released Google Go as the lightweight version of the flagship Google app on Android devices.


Source: This article was published By Jagmeet Singh


Categorized in Search Engine

Editor’s note: This post is part of an ongoing series looking back at the history of Google algorithm updates. Enjoy!

Google’s Freshness, or “fresher results”, update – as the name suggests – was a significant ranking algorithm change, building on the Caffeine update, which rolled out in June 2010.

When Google announced an algorithm change on November 3, 2011, impacting ~35 percent of total searches (6-10 percent of search results to a noticeable degree), focusing on providing the user with ‘fresher, more recent search results‘, the SEO industry and content marketers alike stood up and took notice.

Where Does the Name Come From?

The freshness or ‘fresher results’ name for this algorithm update is directly taken from the official Google Inside Search blog announcement.

Google Freshness Update Nov 2011

Why Was the Freshness Update Launched?

It is predicted that more data will be created in 2017 than the previous 5,000 years of humanity, a trend which has been ongoing for a few years now, and one driving Google to act to cater for this availability and demand for up to date, fresh, new content.

When you combine this data and content growth, with the levels of new and unique queries Google handles, you begin to establish justification for identifying, handling, prioritizing and ranking fresh content within the Google search index.

According to a 2012 ReadWrite article, 16 to 20 percent of queries that get asked every day have never been asked before.

A key intention of this update is to provide greater emphasis on the importance of recentness of content specifically tied to areas like latest news, events, politics, celebrities, trends and more, specifically where the user is expected to want to know the most current information.

Someone searching for “Taylor Swift boyfriend” will likely want to know the current person she is dating, therefore content time/date stamped yesterday, with lots of social shares, engagement, and backlinks over the past few hours, will likely displace prior ranking content which has not been updated, or providing the same activity freshness signals.

Here are the results for this query as at the time of writing this article.

Tailor Swift SERPs Oct 2017

Who Was Impacted by Freshness Algorithm?

At a noticeable level, between 6 to 10 percent of search queries were impacted by the Freshness algorithm, but some degree of change was applied to a collective third (35 percent) of all searches.

One of the interesting aspects of the Freshness Algorithm update was the fact that many more sites appeared to have gained from the update, as opposed to having seen lost rankings or visibility from them. This is quite uncommon with most changes to the Google algorithm.

Looking specifically at the identified “winners” from the update, according to Searchmetrics:

Google prefers sites like news sites, broadcast sites, video portals and a lot Brand sites. This is also a type of sites which have regularly fresh content and a big brand with higher CTRs.

Industry Reaction to the Freshness Update

Due to the nature of the update being an overarching positive change; one rewarding content creators, fresh/relevant/latest news providers, and many bigger brands investing in content, the initial reaction was tied towards analysis of the change and the logical nature of the update.

The analysis of the change was associated with the expected “big” impact from the Google announcement of 35 percent of search results being affected, and the actual disproportionately small amount of negative impact being reported.

The Solution/Recovery Process

The Freshness update is one of my favorite Google algorithms as it makes perfect sense, and was impactful for changing SERPs for the better, in a logical, easy to understand, and practical way.

If you’re covering a topic area and the information you have is out of date, time-sensitive, hasn’t been refreshed or updated in some time, or is simply being surpassed by more engaging, fresh and new competing content, it is likely that you need to give that content/topic some more attention, both on page and off page.

An important part of the freshness update is that it is not just about refreshing content, but also tied to the frequency of content related to the topic.

For example; the expected frequency of content prominently ranking during a political campaign spanning weeks, would reflect the latest campaign changes rather than static (even day old) content, with since surpassed relevancy, accuracy, and associated user engagement and social sharing signals.

This update was building on Google’s established “query deserves freshness” (QDF) methodology:

THE QDF solution revolves around determining whether a topic is “hot.” If news sites or blog posts are actively writing about a topic, the model figures that it is one for which users are more likely to want current information. The model also examines Google’s own stream of billions of search queries, which Mr. Singhal believes is an even better monitor of global enthusiasm about a particular subject.

It also was made possible by Google’s Caffeine web search index update:

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.

Practical Tactics for Recovering from the Freshness Algorithm

Five of the best ways to recover from any lost ranking (or to take advantage of the new untapped opportunity) as a result of the Freshness Algorithm change include:

1. Revisit Existing Content

Look through year on year, or even previous period content performance. Identify pages/topics that previously drove volumes of impressions, traffic, and rankings to the website, and prioritize refreshing them.

You may find that time and date stamped content in blogs, news, and media sections, have seen significant data change/drops. If this is the case, consider the value of updating the historical content by citing new sources, updating statistics, including more current quotes, and adding terms reflecting latest search queries.

2. Socially Share & Amplify Content

Social signals, fresh link signals, and associated external interest/buzz surrounding your content can fuel ranking gains tied to QDF and previous algorithm updates like the Freshness update.

Don’t underestimate the value of successful social sharing and PR activities driving new content discovery, engagement, and interaction.

3. Reconsider Content Frequency

If your website covers industry change, key events, and any degree of breaking news/insight, you may need to think about the frequency that you are informing your audience, and adding content to your website.

People are digesting more content than ever before, and users demand the latest news as it happens – minor frequency changes can make a positive difference between being first to market, or being late to the party.

4. Take a Tiered Approach to Content Creation 

With voice, video, images, virtual reality, and a host of content types, plus common website inclusion approaches (blogs, news, media, content hubs, microsites, more), adding layers of content to your digital offering will enable broader visibility of the brand on key ranking areas, plus extra  leverage of the various search verticals at your disposal.

Whether these updates result in new landing pages or adding of depth and content value to existing URLs, will differ on intent, but either way, this will support many of the freshness points relating to recovery or gains tied to this update.

5. Add Evergreen Content Into Your Content Mix 

Evergreen content is the deeper content creation that has more redundancy to the test of time, and is able to perform month in and month out, contributing to search rankings and traffic over many months, even years. Typically evergreen content reflects:

  • Thorough topical research.
  • Unique insight.
  • Targeted application of expertise on a given topic.
  • Refined content that gets updated every few months when changes require modification.
  • Longer form content (often in the several thousands of works criteria).
  • Mixed content type inclusive.

You may see this as your hero content pieces, those warranting budget, promotion, and reinvestment of time and resource.

How Successful was the Freshness Algorithm Update?

Although the Freshness Algorithm change isn’t frequently mentioned in many industry topical conversations and often gets overshadowed by the likes of Penguin, Panda, Hummingbird, Mobile First, RankBrain, and others, to me, this reinforces the level of success it had.

When you look for time intent queries like [football results] you will notice that dominant sites are providing:

  • Live scores
  • In-game updates
  • Latest results
  • Interactive scoreboards
  • Current fixtures
  • Much more

These useful and changing (often changing by the hour) results reflect the practical benefits that this update has had to our search experience, and the opportunity this brings to value-based companies, able to act on the latest data.

Freshness Myths & Misconceptions

The biggest misconception related to this algorithm update was the anticipated negative impact tied to the scale of results (~35 percent) that would be applicable to Google Freshness.

As this was one of the more positive and practical algorithm changes, the freshness update has been overlooked by many, playing the role of unsung auditor of tired, unloved content needing to be improved, and of active content use able to satisfy searcher needs, and rank for more time-sensitive user intent.

Source: This article was published By Lee Wilson

Categorized in Search Engine

Still, growing frustration with rude, and even phony, online posting begs for some mechanism to filter out rubbish. So, rather than employ costly humans to monitor online discussion, we try to do it with software.

Software does some things fabulously well, but interpreting language isn’t usually one of them.

I’ve never noticed any dramatic difference in attitudes or civility between the people of Vermont and New Hampshire, yet the latest tech news claims that Vermont is America’s top source of “toxic” online comments, while its next-door neighbor New Hampshire is dead last.

Reports also claim that the humble Chicago suburb of Park Forest is trolls’ paradise.

After decades living in the Chicago Metropolitan area, I say without hesitation that the people of Park Forest don’t stand out from the crowd, for trolling or anything else. I don’t know whether they wish to stand out or not, but it’s my observation that folks from Park Forest just blend in. People may joke about Cicero and Berwyn, but not Park Forest.

So what’s going on? Software.

Perspective, a tool intended to identify “toxic” online comments, is one of the Jigsaw projects, Google experiments aimed at promoting greater safety online. Users feed it comments, and Perspective returns a 0-100 score for the percent of respondents likely to find the comment “toxic,” that is, likely to make them leave the conversation.

It was released months ago, but has drawn a blast of new publicity in the past few days since Wired used it for development of “Trolls Across America,” an article featuring an online map highlighting supposed trolling hotspots across the country.

Interpreting language is one of the most complex and subtle things that people do. The meaning of human communication is based in much more than the dictionary meaning of words. Tone of voice, situation, personal history and many other layers of context have roles to play.

The same remark may hold different significance for each person who hears it. Even one person may view a statement differently at different moments. Human language just does not lend itself to the kinds of strict rules of interpretation that are used by computers.

As soon as Perspective (which is clearly labeled as a research project) was announced, prospective users were warned about its limitations. Automated moderation was not recommended, for example. One suggested use was helping human moderators decide what to review.

David Auerbach, writing for MIT’s Technology Review, soon pointed out that “It’s Easy to Slip Toxic Language Past Alphabet’s Toxic-Comment Detector. Machine-learning algorithms are no match for the creativity of human insults.” He tested an assortment of phrases, getting results like these:

  • “‘Trump sucks’ scored a colossal 96 percent, yet neo-Nazi codeword ‘14/88’ only scored 5 percent.” [I also tested “14/88” and got no results at all. In fact, I tested all of the phrases mentioned by Auerbach and got somewhat different results, though the patterns were all similar.]
  • “Jews are human,” 72. “Jews are not human,” 64.
  • “The Holocaust never happened,” 21.

Twitter’s all atwitter with additional tests results from machine learning researchers and other curious people. Here is a sample of the phrases that were mentioned, in increasing order of toxicity scores from Perspective:

  1. I love the Führer, 8
  2. I am a man, 20
  3. I am a woman, 41
  4. You are a man, 52
  5. Algorithms are likely to reproduce human gender and racial biases, 56
  6. I am a Jew, 74
  7. You are a woman, 79

Linguistically speaking, most of these statements are just facts. If I’m a woman, I’m a woman. If you’re a man, you’re a man. If we interpret such statements as something more than neutral facts, we may be reading too much into them. “I love the Führer” is something else entirely.  To look at these scores, though, you’d get a very different impression.

The problem is, the scoring mechanism can’t be any better than the rules behind it.

Nobody at Google set out to make a rule that assigned a low toxicity score to “I love the Führer” or a high score to “I am a Jew.” The rules were created in large part through automation, presenting a crowd of people with sample comments and collecting opinions on those comments, then assigning scores to new comments based on similarity to the example comments and corresponding ratings.

This approach has limitations. The crowd of people are not without biases, and those will be reflected in the scores. And terminology not included in the sample data will create gaps in results.

A couple of years ago, I heard a police trainer tell a group of officers that removing one just word from their vocabulary could prevent 80% of police misconduct complaints filed by the public. The officers had no difficulty guessing the word. It’s deeply embedded in police jargon, and has been for so long that it got its own chapter in the 1978 law enforcement book Policing: A View from the Street.

Yet the same word credited for abundant complaints of police misconduct has appeared in at least 3 articles here on Forbes in the past month (123.), and not drawn so much as a comment.

Often, it’s not the words that offend, but the venom behind them. And that’s hard, if not impossible, to capture in an algorithm.

This isn’t to say that technology can’t do some worthwhile things with human language.

Text analytics algorithms, rules used by software to convert open-ended text into more conventional types of data, such as categories or numeric scores, can be useful. They lie at the heart of online search technology, for example, helping us find documents to topics of interest. Some other applications include:

  • e-discovery, which increases productivity for legal teams reviewing large quantities of documents for litigation
  • Warranty claim investigation, where text analysis helps manufacturers to identify product flaws early and enable corrective action
  • Targeted advertising, which uses text from content that users read or create to present relevant ads

It takes more than a dictionary to understanding the meaning of language. Context, on the page and off, is all important.

People recognize the connections between the things that people write or say, and the unspoken parts of the story. Software doesn’t do that so well.

Meta S. Brown is author of Data Mining for Dummies and creator of the Storytelling for Data Analysts and Storytelling for Tech workshops.

Source: This article was published

Categorized in Search Engine

Big data has and will change how advertisers work and businesses market.

There are plenty of words online about how big data will change every facet of our lives, and a substantial chunk of those words are devoted towards how big data will affect advertising. But instead of haphazardly leaping on the change bandwagon, advertisers need to sit down and understand what big data has changed and yet what still remains the same.

At its core, advertising is about communication as it seeks to inform consumers about a business’s product and services. But different consumers want to hear different messages, which becomes all the more important as new customers join the internet thanks to the growing popularity of mobile.

Big data can refine those messages, predict what customers want to hear with predictive analytics, and yield new insights in what customers want to hear. All of this is certainly revolutionary and will change how consumers and marketers approach advertising. But it will still be up to advertisers to create messages in the name of their clients.

Algorithms and targeting

Some things which many people do not think about as advertising are in fact a conflation of big data and marketing. Netflix is a terrific example of this. Netflix obviously does not have advertisements, but it heavily relies on algorithms to recommend shows to its viewers. These algorithms save Netflix $1 billion per year by reducing the churn rate and marketing the right shows to the right customers.

Netflix’s efforts to target consumers with the right shows is hardly unusual, as websites and online stores like YouTube, Amazon, or Steam do this all the time these days. But the key here is the reliance on algorithms to make targeting more accurate.

These algorithms require a constant stream of data to stay up to date. But now that data is everywhere. Internet users leave a constant stream of data not just on social media websites, but anywhere they go in the form of digital footprints.

This represents new opportunities and challenges for advertisers. On one hand, the digital footprints which everyone creates offers new insights to advertisers into what we truly want which can be more accurate than what we say on social media. But at the same time, advertisers do have to worry about protecting consumer privacy and security. This is not just a moral thing; advertisers or websites that are flagrantly cavalier with their user data will spark a backlash that will hurt business.

Advertising targeting has already been in place for some time now. But as advertisers collect more data, targeting will become more personalized and thus effective. Advertisers will fight not just to collect as much data as possible, but to collect data which accurately represents individual customers to market to their individual tastes.

Changing forms of advertising

Big data can uncover new information about each individual customer, but the advertiser must craft a message to appeal to said customer. But with these new insights, advertisers can entirely change how they approach marketing as they craft entirely new strategies.

This is not completely new. The rise in content marketing is often cited as a major beneficiary of big data, but content marketing as a concept is older than the Internet. Nevertheless, the rise in content marketing as well as other strategies like native advertising or the endless dance around search engine optimization.

These rising advertising strategies are fascinating because just as advertisers rely on data to craft new strategies, they give data right back to the consumer. Content marketing is all about giving consumers details about a business such as how they make food, what it is like to work there, and so on. By sharing this data, the company makes the customers feel like they are part of a group which knows common information. And in turn the customer ends up giving up his data to the company which lets it construct new advertising strategies.

This symbiosis between consumer and company shows that data is not just about cold analytics, but is about creating a bond between the two groups like all advertising sets out to do. Similarly, businesses must take the complexity of big data, analyze trends, and then create simple guidelines which their customer staff can use. All the advertising in the world will not make as big of an impression on a customer as one surly or confused customer representative.

Big data has and will change how advertisers work and businesses market to consumers through more personalized and targeted advertisement as well as creating new forms of advertising. But big data is less important than smart data and strategy. Business leaders who can break big data down into small chunks, come up with a smart strategy, and formulate an effective message will still thrive just as much as they would have in the past. In this way, big data is not quite the revolutionary change that many think.

This article is published as part of the IDG Contributor Network. Want to Join?

Categorized in Search Engine

When Netflix recommends you watch “Grace and Frankie” after you’ve finished “Love,” an algorithm decided that would be the next logical thing for you to watch. And when Google shows you one search result ahead of another, an algorithm made a decision that one page was more important than the other. Oh, and when a photo app decides you’d look better with lighter skin, a seriously biased algorithm that a real person developed made that call.

Algorithms are sets of rules that computers follow in order to solve problems and make decisions about a particular course of action. Whether it’s the type of information we receive, the information people see about us, the jobs we get hired to do, the credit cards we get approved for, and, down the road, the driverless cars that either see us or don’t see us, algorithms are increasingly becoming a big part of our lives.

But there is an inherent problem with algorithms that begins at the most base level and persists throughout its adaption: human bias that is baked into these machine-based decision-makers.

You may remember that time when Uber’s self-driving car ran a red light in San Francisco, or when Google’s photo app labeled images of black people as gorillas. The Massachusetts Registry of Motor Vehicles’ facial-recognition algorithm mistakenly tagged someone as a criminal and revoked their driver’s license. And Microsoft’s bot Tay went rogue and decided to become a white supremacist. Those were algorithms at their worst. They have also recently been thrust into the spotlight with the troubles around fake news stories surfacing in Google search results and on Facebook.

But algorithms going rogue have much greater implications; they can result in life-altering consequences for unsuspecting people. Think about how scary it could be with algorithmically biased self-driving cars, drones and other sorts of automated vehicles. Consider robots that are algorithmically biased against black people or don’t properly recognize people who are not cisgender white people, and then make a decision on the basis that the person is not human.

Another important element to consider is the role algorithm’s play in determining what we see in the world, as well as how people see us. Think driverless cars “driven” by algorithms mowing down black people because they don’t recognize black people as human. Or algorithmic software that predicts future criminals, which just so happens to be biased against black people.

A variety of issues can arise as a result of bad or erroneous data, good but biased data because there’s not enough of it, or an inflexible model that can’t account for different scenarios.

The dilemma is figuring out what to do about these problematic algorithmic outcomes. Many researchers and academics are actively exploring how to increase algorithmic accountability. What would it mean if tech companies provided their code in order to make these algorithmic decisions more transparent? Furthermore, what would happen if some type of government board would be in charge of reviewing them?

Whatever approach is taken to ensure bias is removed from the development of algorithms, it can’t dramatically impede progress, DJ Patil, former chief data scientist of the U.S., tells me. Solutions can only be implemented, and therefore effective, if tech companies fully acknowledge their roles in maintaining and perpetuating bias, discrimination and falsehoods, he adds.

If you think about the issues we faced a few years ago versus the issues we face now, they are compounding, he adds. “So how do we address that challenge?” In developing new technologies, there needs to be more diversity on the team behind these algorithms. There’s no denying that, Patil says, but the issue is scalability across the whole realm of diversity.

“There’s diversity of race, there’s diversity of religion, there’s diversity with respect to disability. The number of times I’ve seen somebody design something where if they made a slight decision choice you could design for a much broader swath of society — just so easy, just to make a slight design change, but they just didn’t know. And I think one of the challenges is that we don’t have scalable templates to do that.”

Google, for example, determines what many people see on the internet. As Frank Pasquale writes in his book, “The Black Box Society: The Secret Algorithms That Control Money and Information,” Google, as well as other tech companies, set the standards by which all of us are judged, but there’s no one really judging them (141, Pasquale).

When you conduct a Google Image search for “person,” you don’t see very many people of color, which perpetuates the normalization of whiteness and reinstills biases around race. Instead, you’ll see many pictures of white men, says Sorelle Friedler, affiliate at the Data & Society Research Center and ex-Googler who worked on X and search infrastructure.

“That is perhaps representative of the way that ‘person’ is broadly used in our society, unfortunately,” Friedler says. “So then the question is, is it appropriate for that sort of linguistic representation to make its way to image search? And then Google would need to decide, am I okay with black people only being represented only if you search specifically for black people? And I think that that’s a philosophical decision about what we want our society to look like, and I think it’s one that’s worth reckoning with.”

Perhaps Google doesn’t see itself as having a big responsibility to intervene in a situation like this. Maybe the argument of “it’s a result of the things our users are inputting” is acceptable in this scenario. But search queries related to the Holocaust or suicide have prompted Google to intervene.

Algorithms determine Google’s search results and suggestions. Some of Google’s algorithmic fails are more egregious than others, and sometimes Google steps in, but it often doesn’t.

“If you search for ways to kill yourself, you’re directed toward a suicide hotline,” Robyn Caplan, a research analyst at Data & Society, tells TechCrunch. “There are things Google has deemed relevant to the public interest that they’re willing to kind of intervene and guard against, but there really isn’t a great understanding of how they’re assessing that.”

Earlier this year, if you searched for something like, “is the Holocaust real?,” “did the holocaust happen” or “are black people smart?” one of the first search results for both queries was pretty problematic. It wasn’t until people expressed outrage that Google decided to do something.

“When non-authoritative information ranks too high in our search results, we develop scalable, automated approaches to fix the problems, rather than manually removing these one-by-one,” a Google spokesperson tells TechCrunch via email. “We are working on improvements to our algorithm that will help surface more high-quality, credible content on the web, and we’ll continue to improve our algorithms over time in order to tackle these challenges.”

In addition to its search results and suggestions, Google’s photo algorithms and the ads it serves have also been problematic. At one point, Google’s photo algorithm mistakenly labeled black people as gorillas.

Google launched Photos in May 2015 to relatively good reception. But after developer Jacky Alciné pointed out the flaw, Bradley Horowitz, who led Google Photos at the time, said his inbox was on fire.

“That day was one of the worst days of my professional life, maybe my life,” Horowitz said in December.

“People were typing in gorilla and African-American people were being returned in the search results,” Horowitz said. How that happened, he said, was that there was garbage going in and garbage going out — a saying he said is common in computer science. “To the degree that the data is sexist or racist, you’re going to have the algorithm imitating those behaviors.”

Horowitz added that Google’s employee base isn’t representative of the users it serves. He admitted that if Google had a more diverse team, the company would have noticed the problems earlier in the development process.

Another time, Google featured mugshots at the top of search results for people with “black-sounding” names. Latanya Sweeney, a black professor in government and technology at Harvard University and founder of the Data Privacy Lab, brought this to the public’s attention in 2013 when she published her study of Google AdWords. She found that when people search Google for names that traditionally belong to black people, the ads shown are of arrest records and mugshots.

What’s driving mistakes like this is the idea that the natural world and natural processes are just like the social world and social processes of people, says Pasquale.

“And it’s this assumption that if we can develop an algorithm that picks out all of the rocks as rocks correctly, we can have one that classifies people correctly or in a useful way or something like that,” Pasquale says. “I think that’s the fundamental problem. They are taking a lot of natural science methods and throwing them into social situations and they’re not trying to tailor the intervention to reflect human values.”

When an algorithm produces less-than-ideal results, it could be that the data set was bad to begin with, the algorithm wasn’t flexible enough, the team behind the product didn’t fully think through the use cases, humans interacted with the algorithm enough to manipulate it, or even all of the above. But no longer are the days where tech companies can just say, ‘Oh, well it’s just an app” or “Oh, we didn’t have the data,” Patil says. “We have a different level of responsibility when you’re designing a product that really impacts people’s lives in the way that it can.”

While algorithms also have vast potential to change our world, Google’s aforementioned fails are indicative of a larger issue: the algorithm’s role in either sustaining or perpetuating historic models of discrimination and bias or spreading false information.

“There is both the harm data can do and the incredible opportunity it has to help,” Patil says. “We often focus on the harm side and people talk about the way math is — we should be scared of it and why we should be so afraid of it.

“We have to remember these algorithms and these techniques are going to be the way we’re going to solve cancer. This is how we’re going to cure the next form of diseases. This is how we’re going to battle crises like Ebola and Zika. Big data is the solution.”

Abarrier to tackling algorithmic issues that pertain to content on the internet is Section 230 of the Communications Decency Act, which states, “No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.”

It made it possible for tech companies to scale because it relieved platforms of any responsibilities for dealing with illegal or objectionable conduct of its users. The Electronic Frontier Foundation calls it “one of the most valuable tools for protecting freedom of expression and innovation on the internet.”

If this law didn’t exist, we could essentially deputize Google and other tech companies to operate as censors for what we consider to be objectionable speech. Something like that is happening in Europe with The Right to be Forgotten.

“American scholars and policy people are somewhat terrified because basically what has happened there is practically speaking, Google has become the arbiter of these claims,” says Solon Barocas, a postdoc researcher at the NYC Lab of Microsoft Research and member of the Society, Ethics, and AI group. “It’s not like a government agency is administering the decisions of what should be taken down. Instead, it said ‘Google you have a responsibility to do this’ and then Google does it themselves. That has frightened a lot of Americans.”

But given the existence of Section 230 of the CDA and the fact that it provides many protections for platforms, it may be difficult to use legislative means in the U.S. to affect what content is trending over Facebook or what search results appear on Google.

Outside the U.S., however, legislation could affect the way these tech companies operate inside America, Caplan says. In Germany, for example, the government has drafted a law that would fine social networks up to 50 million euro for failing to remove fake news or hate speech.

Meanwhile, the European Union’s digital chief, Andrus Ansip, warned Facebook earlier this year that while he believes in self-regulatory measures, he’s ready to take legislative action if it comes to that.

“What we’ve seen in the past is that these types of policies that take place outside of the U.S. do have a pretty big role in shaping how information is structured here,” Caplan says. “So if you look at Google’s autocomplete algorithm, you see a similar thing — that different auto-completions aren’t allowed because of libel cases that happened abroad, even though Google is protected here. Those kinds of policies proposed by countries with a clear understanding of what they’re willing to regulate media-wise may have an interesting impact here.”

Even if Section 230 stays in place, and it most likely will, there are ways to reevaluate and reprogram algorithms to make better decisions and circumvent potential biases or discriminatory outcomes before they happen.

While there needs to be more diversity on the teams developing software in order to truly take into account the different number of scenarios an algorithm may have to deal with, there’s no straightforward, cut-and-dried solution to every company’s algorithmic issues. But researchers have proposed several potential methods to address algorithmic accountability.

Two areas developing rapidly are related to the front- and backend process, respectively, Barocas tells me. The front-end method involves ensuring certain values are encoded and implemented in the algorithmic models that tech companies build. For example, tech companies could ensure that concerns of discrimination and fairness are part of the algorithmic process.

“Making sure there are certain ideas of fairness that constrain how the model behaves and that can be done upfront — meaning in the process of developing that procedure, you can make sure those things are satisfied.”

On the backend, you could imagine that developers build the systems and deploy them without being totally sure how they will behave, and unable to anticipate the potential adverse outcomes they might generate. What you would do, Barocas says, is build the system, feed it a bunch of examples, and see how it behaves.

Let’s say the system is a self-driving car and you feed it examples of pedestrians (such as a white person versus a black person versus a disabled person). By analyzing how the system operates based on a variety of inputs/examples, one could see if the process is discriminatory. If the car only stops for white people but decides to hit black and disabled people, there’s clearly a problem with the algorithm.

“If you do this enough, you can kind of tease out if there’s any type of systematic bias or systematic disparity in the outcome, and that’s also an area where people are doing a lot of work,” Barocas says. “That’s known as algorithmic auditing.”

When people talk about algorithmic accountability, they are generally talking about algorithmic auditing, of which there are three different levels, Pasquale says.

“In terms of algorithmic accountability, a first step is transparency with respect to data and algorithms,” Pasquale says. “With respect to data, we can do far more to ensure transparency, in terms of saying what’s going into the information that’s guiding my Facebook feed or Google search results.”

So, for example, enabling people to better understand what’s feeding their Facebook news feeds, their Google search results and suggestions, as well as their Twitter feeds.

“A very first step would be allowing them to understand exactly the full range of data they have about them,” Pasquale says.

The next step is something Pasquale calls qualified transparency, where people from the outside inspect and see if there’s something untoward going on. The last part, and perhaps most difficult part, is getting tech companies to “accept some kind of ethical and social responsibility for the discriminatory impacts of what they’re doing,” Pasquale says.

The fundamental barrier to algorithmic accountability, Pasquale says, is that until we “get the companies to invest serious money in assuring some sort of both legal compliance and broader ethical compliance with personnel that have the power to do this, we’re not really going to get anywhere.”

Pasquale says he is a proponent of government regulation and oversight and envisions something like a federal search commission to oversee search engines and analyze how they rank and rate people and companies.

Friedler, however, sees a situation in which an outside organization would develop metrics that measure what they consider to be the problem. Then that organization could publicize those metrics and its methodology.

“As with many of these sorts of societal benefits, it’s up to the rest of society to determine what we want to be seeing them do and then to hold them accountable,” Friedler tells me. “I also would like to believe that many of these tech companies want to do the right thing. But to be fair, determining what the right thing is is very tricky. And measuring it is even trickier.”

Algorithms aren’t going to go away, and I think we can all agree that they’re only going to become more prevalent and powerful. But unless academics, technologists and other stakeholders determine a concrete process to hold algorithms and the tech companies behind them accountable, we’re all at risk.

This article was  published in by  Megan Rose Dickey

Categorized in Search Engine

Google is in the process of revamping its existing search algorithm to curb the promotion of extreme views, conspiracy theories and, most importantly, fake news.

The internet giant that has an internal and undisclosed ranking for websites and their URLs said it will demote "low-quality" websites especially those circulating misleading or fake content. A group of 10,000-plus staff, known as "raters" will assess search results and flag web pages that host hoaxes, conspiracy theories and content that is sub-par.

"In a world where tens of thousands of pages are coming online every minute of every day, there are new ways that people try to game the system," Google's Ben Gomes said in a blog post. "In order to have long-term and impactful changes, more structural changes [to Google's search engine] are needed."

Check out some of the major changes Google has made public regarding its algorithm change:

  • Users can now report offensive suggestions from the Autocomplete feature and false statements in Google's Direct Answer box, which will be manually checked by a moderator.
  • Users can even flag content that appears on Featured Snippets in search
  • Instead of bots that have been traditionally used by search companies, Google assures real people will assess the quality of Google's search results
  • Low-quality web pages with content of conspiracy theories, extremism and unreliable sources will be demoted in ranking
  • More authoritative pages with strong sources and facts will be rated higher
  • Linking to offending websites and hiding text on a page that is invisible to humans, but visible to the search algorithms can also demote a webpage
  • Suspicious files and formats not recognised on landing pages which the company warns is malware in many cases

For a detailed explanation on how the company determines its search and rankings check out its search quality evaluation guidelines that has been updated. The company that has been secretive about its search strategy in the past, has now promised more transparency to let people know how the business works after coming under fire for failing to combat fake and extremist content.

Source : by Agamoni Ghosh

Categorized in Search Engine
Page 2 of 6

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.
Please wait

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Newsletter Subscription

Receive Great tips via email, enter your email to Subscribe.
Please wait

Follow Us on Social Media

Book Your Seat for Webinar GET FREE REGISTRATION FOR MEMBERS ONLY      Register Now