Here are some of the most whackiest and bizarre searches people make on Google

Nowadays we rely more on Google than books for information. As time goes, Google is becoming more and more wiser and dishing out exact results we want. But there are people on this Earth who have the most whackiest questions for Google. These people are taking Google search to a whole new level altogether. From wanting to know what happens if they drink blood to whether passing wind burns calories, these folks are more hilarious than curious.

A marketing agency, Digitaloft, did some research into these nerdy creatures and came out with some rib-tickling data about Google searches. As strange as it can get, here are a few strangest Google questions people have asked the search engine.

The top whackiest search made on Google is “am I  pregnant?”. Seems like there are people on Earth who trust Google more than doctors or pregnancy tests.

 

The most popular question on the pictorial chart, created by marketing agency Digitaloft is: 'Am I pregnant?' 

Some existential users are concerned with the big questions, with 8,100 monthly searches on Google for 'why are we here?' and 49,500 for 'when will I dies' shown above

According to the chart, 49,500 people a month ask whether farting burns calories, but unfortunately the myth this bodily function burns 67 calories is false. While a section of people have a really intuitive question, why men have nipples when apparently it doesn’t do anything. Some 22,200 are curious as to why men have nipples, while a more troubled 4,400 people a month Google ‘why does my bellybutton smell?’

Others are in search of answers to life's mysteries, with 8,100 people asking if the tooth fairy is real every month. The infographic provides and cute and child-friendly answer

Worryingly, 3,600 people a month ask whether men have periods (infographic shown above), with another 2,900 querying whether men can become pregnant, displaying a rather poor grasp of biology

Nearly 3600 people are worried whether men have periods and another 2900 queried whether men can become pregnant. A whopping 49,500 people think Google is god and searched “when will I die”on it.

 

Some 18,100 people ask Google whether penguins have knees every month, 8,100 want to know if pigs sweat and 2,900 are curious whether worms have eyes – they don’t.

An insecure 2,900 people every month ask the search engine ‘does my dog love me?’ every month.

Some 800 people a month ask Google 'Can I marry my cousin?' according to the infographic (pictured), meaning 10,560 people a year might be considering popping the question to a relative

If you have made such a similar bizarre search on Google, kindly mention it in comments and let others know how whacky you are!

http://www.techworm.net/2016/07/20-bizarre-whackiest-google-searches-people-make.html

Categorized in Search Engine

Early Wednesday morning, Donald Trump found nowhere in Google search engine results. Instead, results show Democrats Bernie Sanders and Hillary Clinton, as well as Jill Stein of Green Party. It remained no change for 8 hours or so and then Google fixed it.

donald trump

When an input the term “presidential candidates” is given to search engine, a box shows information of “Active Campaigns” highlights at the top of the page. Google lagged room for Republic Party nominee Donald Trump, Libertarian candidate Gary Johnson and cult favourite contender Vermin Supreme. Donald Trump was included “presidential candidates” in Google search engine before accepting his nomination for president candidate by Republic Party.

President Obama Strikes Trump And Appeal For Hilary Clinton In DNC Speech

Search engine giant was accused of the biassed search result in favour of Hillary Clinton in June 2016. The company released a statement declaring all it happened due to a ‘technical bug’. “Google Autocomplete does not favour any candidate or cause. Claims to the contrary just misunderstand how Autocomplete works. We found a technical bug in Search where only the presidential candidates participating in an active primary election were appearing in a Knowledge Graph result. Because the Republican and Libertarian primaries have ended, those candidates did not appear. This bug was resolved early this morning.” A Google Spokesperson said.

Active Presidential Campaigns’

Google doesn’t reveal the source where the information is getting from. The problem persists in the absence of reference points, and it’s tough to find out where things had gone wrong. Without clear reference points, the search engine presents with a big font of all knowledge.

Political War Between Hillary Clinton and Donald Trump

Finally, the error is fixed, and the “Active Campaigns” info box now shows Donald Trump, Hillary CIinton, Johnson and Jill Stein. 

 http://techfactslive.com/donald-trump-left-out-of-google-search-results-for-active-presidential-campaigns/4717/

Categorized in Search Engine

Donald Trump was omitted from a Google search of presidential candidates earlier this week because of a "technical bug" in the search engine's information mapping system used for filtering top results.

"We found a technical bug in 'Search' where only the presidential candidates participating in an active primary election were appearing in a Knowledge Graph result," a Google spokesman told Snopes.com on Wednesday. "Because the Republican and Libertarian primaries have ended, those candidates did not appear.

Special: Trump's Plan to Build a Wall – AGREE or DISAGREE?

"This bug was resolved early this morning."

Internet users searching the term "presidential candidates" on Thursday found that the results produced text and pictures of Democrats Hillary Clinton and Bernie Sanders and Green Party candidate Jill Stein, Snopes reported.

No information on Trump or Libertarian Party candidate Gary Johnson came up in the search.
Latest News Update

However, the Snopes report includes a screenshot of similar search results that was taken by Stein on July 17, after she petitioned Google to be included in the results.

Special: 4 Jaw-Dropping Cards Charging 0% Interest Until 2018

That screenshot included both Trump and Johnson, according to the report.

Trump accepted the Republican presidential nomination last Thursday. The last primaries were held on June 7.

Twitter erupted with angry Google users slamming the search engine for excluding Trump.

In recent months, Google and Facebook have been among the Silicon Valley companies that have been accused of bias toward conservatives and similar news organizations.

Google has long insisted that it does not favor any political ideology.

Special: Jim Rogers Warns: Coming Collapse Will Be Worst 'We Have Ever Seen in Recorded History'

The company's CEO, Eric Schmidt, has been on the Democratic National Committee's Democratic Victory Task Force since 2014, The Daily Caller reports.

© 2016 Newsmax. All rights reserved.

http://www.newsmax.com/Politics/google-technical-bug-omit-trump/2016/07/28/id/741077/

Categorized in Search Engine

More than 484,000 Google keyword searches a month from around the world, including at least 54,000 searches in the UK, return results dominated by Islamist extremist material, a report into the online presence of jihadism has revealed.

The study found that of the extremist content accessible through these specific keyword searches, 44% was explicitly violent, 36% was non-violent and 20% was political Islamist in content, the last being non-violent but disseminated by known Islamist groups with political ambitions.

The study is one of the first to expose the role of the search engine rather than social media in drawing people to extremist jihadi material on the web. It argues the role of the search engine – a field dominated by Google – has been a blind spot that has been missed by those seeking to measure and counter extremist messages on the internet.

Although the UK government’s Prevent strategy claims the internet must not be ungoverned space for Islamist extremism and British diplomats have taken the lead in the global communications fight against Islamic State on the net, the study suggests government agencies are only at the beginning of a “labyrinthine challenge”. So-called counter-narrative initiatives led by governments and civil society groups are “under-resourced and not achieving sufficient natural interest”, suggesting the battle of ideas is not even being engaged, let alone won.

The study, undertaken jointly by Digitalis and the Centre on Religion and Geopolitics, will be challenged by those who claim it advocates censorship, has blurred the lines between political Islam and violent extremism and cannot validly quantify the presence of extremism.

But the findings come in a week in which there has been a spate of terrorist attacks in Germany and France, some undertaken by young people either radicalised on the internet, or using it to feed their obsession with violence. Many of the jihadist foreign fighters in Syria were radicalised online as “the search engine gradually overtakes the library and the classroom as a source of information”.

The study, entitled A War of Keywords: how extremists are exploiting the internet and what to do about it, argues “many of the legitimate mainstream Islamic scholarly websites host extremist material, including jihadi material, often without any warning or safeguards in place”.

It also argues non-violent Islamist organisations, such as Hizb ut-Tahrir, have a very strong online presence and dominate the results for some keyword searches. Some of the most popular search words used were crusader, martyr, kafir (non-believer), khilafa (a pan-Islamic state) or apostate.

In a condemnation of government efforts it finds very little of this content is challenged online. Analysing 47 relevant keywords, the search-engine analysis found counter-narrative content outperformed extremist content in only 11% of the results generated. For the search term khilafah, which has 10,000 global monthly searches, the ratio of extremist content to counter-narrative is nine to one.

This is partly because counter-narrative sites lack search engine optimisation so they do not rank high enough in searches, By contrast, Khilafa.com, the English website of Hizb ut-Tahrir, had more than 100,000 links into it.

The study also warns some of the most-used Muslim websites such as Kalmullah.com and WorldofIslam.info “host traditional Islamic content alongside extremist material” so are knowingly or unknowingly abusing the trust of their readers.

The study also claims a user can come across extremist content relatively easily while browsing for Islamic literature. Few effective restrictions apply to accessing Islamic State English-language magazine Dabiq or Inspire magazine, which is linked to al-Qaeda in the Arabian peninsula. Both are readily available to browse and download through clearing sites.

The study produced its headline numbers by looking at the average monthly number of global searches conducted in Google for 287 extremist-related keywords – 143 in English and 144 in Arabic. It then looked at two samples totalling 47 keywords, the first sample focused on the most-used words and the second sample on the keywords deemed to be most extremist. The research then analysed the first two pages thrown up by the search for these keywords.

The authors acknowledge the difficulties technology companies face in policing the results of their search engines. Google is responsible for 40,000 searches a second, 2.5 billion a day and 1.2 trillion a year worldwide. Facebook boasts more than one and a half billion users who create 5 billion likes a day.

Dave King, chief executive of Digitalis, argues: “While the company’s advertising model is based on automatically mining the content its users create, their ability to distinguish a single credible kill threat from the plethora who have threatened to kill in jest is highly limited.”

The study recommends governments, the United Nations, technology companies, civil society groups and religious organisations together establish a charter setting out a common definition of extremism and pledge to make the internet a safer place.

Technology companies, the report says, could work with governments to shift the balance of the online space, as well as share analytical data and trending information to bolster counter-efforts. It suggests search engine companies have been reluctant to or unable to alter the search algorithms that are responsible for search page rankings.

The authors also call for a debate on “the murky dividing line between violent and non-violent extremist material online”, arguing such legal definitions have been achieved over “copyrighted material, child pornography and hate speech all of which have been subject to removal requests.”

Exiisting content control software that prevents access to graphic or age-restricted material could be used and warning signals put on sites.

A Google spokesperson said: “We take this issue very seriously and have processes in place for removing illegal content from all our platforms, including search. We are committed to showing leadership in this area – and have been hosting counterspeech events across the globe for several years. We are also working with organisations around the world on how best to promote their work on counter-radicalisation online.”

https://www.theguardian.com/technology/2016/jul/28/search-engines-role-in-radicalisation-must-be-challenged-finds-study

Categorized in Search Engine

The Digital Payments 2020 report by Google and BCG analyses the transformation in Digital Payments and its impact on the payment landscape in India.

Why digital payments are on the rise?

66% users like the convenience 48% users are lured by offers 75% merchants feel opting for digital payment will increase sales

What are the hurdles on the way?

50% users find it difficult to understand 50% users stopped using it because it is not accepted everywhere

By 2020

60% of digital payments value will be driven by physical points of sale 50% of person to merchant transactions will be worth less than Rs 100

http://retail.economictimes.indiatimes.com/news/e-commerce/e-tailing/more-than-50-of-indias-internet-users-will-use-digital-payments-by-2020-google-and-bcg-report/53483942

Categorized in Search Engine

What are business attributes, and why should local businesses care? Columnist Adam Dorfman explores.

When checking into places on Google Maps, you may have noticed that Google prompts you to volunteer information about the place you’re visiting. For instance, if you check into a restaurant, you might be asked whether the establishment has a wheelchair-accessible entrance or whether the location offers takeout. There’s a reason Google wants to know: attributes.

Attributes consist of descriptive content such as the services a business provides, payment methods accepted or the availability of free parking — details that may not apply to all businesses. Attributes are important because they can influence someone’s decision to visit you.

Google wants to set itself up as a go-to destination of rich, descriptive content about locations, which is why it crowdsources business attributes. But it’s not the only publisher doing so. For instance, if you publish a review on TripAdvisor or Yelp, you’ll be asked a similar battery of questions but with more details, such as whether the restaurant is appropriate for kids, allows dogs, has televisions or accepts bitcoins.

Many of these publishers are incentivizing this via programs like Google’s Local Guides, TripAdvisor’s Badge Collections, and Yelp’s Elite Squad because having complete, accurate information about locations makes each publisher more useful. And being more useful means attracting more visitors, which makes each publisher more valuable.

android crowdsource
   

It’s important that businesses manage their attributes as precious location data assets, if for no other reason than that publishers are doing so. I call publishers (and aggregators who share information with them) data amplifiers because they amplify a business’s data across all the places where people conduct local searches. If you want people to find your business and turn their searches into actual in-store visits, you need to share your data, including detailed attributes, with the major data amplifiers.

Many businesses believe their principal location data challenge is ensuring that their foundational data, such as their names, addresses and phone numbers, are accurate. I call the foundational data “identities,” and indeed, you need accurate foundational data to even be considered when people search for businesses. But as important as they are — and challenging to manage — identities solve for only one-half of the search challenge. Identities ensure visibility, but you need attributes to turn searches into business for your brand.

Attributes are not new, but they’ve become more important because of the way mobile is rapidly accelerating the purchase decision. According to seminal research published by Google, mobile has given rise to “micro-moments,” or times when consumers use mobile devices to make quick decisions about what to do, where to go or what to buy.

Google noted that the number of “near me” searches (searches conducted for goods and services nearby) have increased 146 percent year over year, and 88 percent of these “near me” searches are conducted on mobile devices. As Google’s Matt Lawson wrote:

With a world of information at their fingertips, consumers have heightened expectations for immediacy and relevance. They want what they want when they want it. They’re confident they can make well-informed choices whenever needs arise. It’s essential that brands be there in these moments that matter — when people are actively looking to learn, discover, and/or buy.

Attributes encourage “next moments,” or the action that occurs after someone has found you during a micro-moment. Google understands that businesses failing to manage their attributes correctly will drop off the consideration set when consumers experience micro-moments. For this reason, Google prompts users to complete attributes about businesses when they check into a location on Google Maps.

At the 2016 Worldwide Developers Conference, Apple underscored the importance of attributes when the company rolled out a smarter, more connected Siri that makes it possible for users to create “next moments” faster by issuing voice commands such as “Siri, find some new Italian restaurants in Chicago, book me dinner, and get me an Uber to the restaurant.” In effect, Siri is a more efficient tool for enabling next moments, but only for businesses that manage the attributes effectively.

And with its recently released Google My Business API update to version 3.0, Google also gave businesses that manage offline locations a powerful competitive weapon: the ability to manage attributes directly. By making it possible to share attributes on your Google My Business page, Google has not only amplified its own role as a crucial publisher of attributes but has also given businesses an important tool to take control of your own destiny. It’s your move now.

http://searchengineland.com/google-mining-local-business-attributes-252283

Categorized in Business Research

Google has made another small acquisition to help it continue building out its latest efforts in social apps. The search and Android giant has hired the team behind Kifi, a startup that was building extensions to collect and search links shared in social apps, as well as provide recommendations for further links — such as this tool, Kifi for Twitter. Terms of the deal are not being disclosed, but, according to Google engineering director Eddie Kessler, the app’s team will be joining the company to work on Spaces, Google’s group chat app.

Google tells me it is not commenting on the exact number of people joining.

It looks like Spaces could use the help. The app launched earlier this year and has had a very lukewarm run in the market so far, currently lingering around 577 in the U.S. iOS App Store and 284 in the U.S. Android store, according to stats from App Annie.

This is essentially an acqui-hire. In a Medium post earlier today, Kifi noted that the app is not coming to Google. It will only remain alive for another few weeks, after which point it will stick around for a few weeks more for data exports only.

While the app is not living on, it sounds like the kind of tech that Kifi’s team — co-founded by Dan Blumenfeld and Eishay Smith (although Blumenfeld left the company some time ago) — will continue. Considering Space’s current focus on group chat, it sounds like this means they could tweak Kifi’s link sharing and link recommendation technology to use them in that context, and to be able to collate them with links from other applications and platforms.

This seems to be what Kessler says will be the intention, too, in his own short Google+ post: “Delighted the Kifi team, with their great expertise in organizing shared content and conversations, is joining the Spaces team to build features that improve group sharing.”

Google has disclosed nearly 200 acquisitions to date. Among them, other recent M&A moves that point to Google building up its talent in areas like social and apps include Pie (a Slack-like app) in Singapore and Moodstocks in Paris (to improve image recognition in apps).

Kifi had raised just over $11 million in funding from Don Katz, Oren Zeev, SGVC and Wicklow Capital.

https://techcrunch.com/2016/07/12/google-acquires-deep-search-engine-kifi-to-enhance-its-spaces-group-chat-app/

Categorized in Search Engine

In late 2015, JR Oakes and his colleagues undertook an experiment to attempt to predict Google ranking for a given webpage using machine learning. What follows are their findings, which they wanted to share with the SEO community.

Machine learning is quickly becoming an indispensable tool for many large companies. Everyone has, for sure, heard about Google’s AI algorithm beating the World Champion in Go, as well as technologies like RankBrain, but machine learning does not have to be a mystical subject relegated to the domain of math researchers. There are many approachable libraries and technologies that show promise of being very useful to any industry that has data to play with.

Machine learning also has the ability to turn traditional website marketing and SEO on its head. Late last year, my colleagues and I (rather naively) began an experiment in which we threw several popular machine learning algorithms at the task of predicting ranking in Google. We ended up with an assembly that achieved 41 percent true positive and 41 percent true negative on our data set.

In the following paragraphs, I will take you through our experiment, and I will also discuss a few important libraries and technologies that are important for SEOs to begin understanding.

Our experiment

Toward the end of 2015, we started hearing more and more about machine learning and its promise to make use of large amounts of data. The more we dug in, the more technical it became, and it quickly became clear that it would be helpful to have someone help us navigate this world.

About that time, we came across a brilliant data scientist from Brazil named Alejandro Simkievich. The interesting thing to us about Simkievich was that he was working in the area of search relevance and conversion rate optimization (CRO) and placing very well for important Kaggle competitions. (For those of you not familiar, Kaggle is a website that hosts machine learning competitions for groups of data scientists and machine learning enthusiasts.)

Simkievich is the owner of Statec, a data science/machine learning consulting company, with clients in the consumer goods, automotive, marketing and internet sectors. Lots of Statec’s work had been focused on assessing the relevance of e-commerce search engines. Working together seemed a natural fit, since we are obsessed with using data to help with decision-making for SEO.

We like to set big hairy goals, so we decided to see if we could use the data available from scraping, rank trackers, link tools and a few more tools, to see if we could create features that would allow us to predict the rank of a webpage. While we knew going in that the likelihood of pulling it off was very low, we still pushed ahead for the opportunity for an amazing win, as well as the chance to learn some really interesting technology.

The data

Fundamentally, machine learning is using computer programs to take data and transform it in a way that provides something valuable in return. “Transform” is a very loosely applied word, in that it doesn’t quite do justice to all that is involved, but it was selected for the ease of understanding. The point here is that all machine learning begins with some type of input data.

(Note: There are many tutorials and courses freely available that do a very good job of covering the basics of machine learning, so we will not do that here. If you are interested in learning more, Andrew Ng has an excellent free class on Coursera here.)

The bottom line is that we had to find data that we could use to train a machine learning model. At this point, we didn’t know exactly what would be useful, so we used a kitchen-sink approach and grabbed as many features as we could think of. GetStat and Majestic were invaluable in supplying much of the base data, and we built a crawler to capture everything else.

Image of data used for analysis

Our goal was to end up with enough data to successfully train a model (more on this later), and this meant a lot of data. For the first model, we had about 200,000 observations (rows) and 54 attributes (columns).

A little background

As I said before, I am not going to go into a lot of detail about machine learning, but it is important to grasp a few points to understand the next section. In total, much of the machine learning work today deals with regression, classification and clustering algorithms. I will define the first two here, as they were relevant to our project.

Image showing the difference between classification and regression algorithms

  • Regression algorithms are normally useful for predicting a single number. If you needed to create an algorithm that predicted a stock price based on features of stocks, you would select this type of model. These are called continuous variables.
  • Classification algorithms are used to predict a member of a class of possible answers. This could be a simple “yes or no” classification, or “red, green or blue.” If you needed to predict whether an unknown person was male or female from features, you would select this type of model. These are called discrete variables.

Machine learning is a very technical space right now, and much of the cutting-edge work requires familiarity with linear algebra, calculus, mathematical notation and programming languages like Python. One of the items that helped me understand the overall flow at an approachable level, though, was to think of machine learning models as applying weights to the features in the data you give it. The more important the feature, the stronger the weight.

When you read about “training models,” it is helpful to visualize a string connected through the model to each weight, and as the model makes a guess, a cost function is used to tell you how wrong the guess was and to gently, or sternly, pull the string in the direction of the right answer, correcting all the weights.

The part below gets a bit technical with terminology, so if it is too much for you, feel free to skip to the results and takeaways in the final section.

Tackling Google rankings

Now that we had the data, we tried several approaches to the problem of predicting the Google ranking of each webpage.

Initially, we used a regression algorithm. That is, we sought to predict the exact ranking of a site for a given search term (e.g., a site will rank X for search term Y), but after a few weeks, we realized that the task was too difficult. First, a ranking is by definition a characteristic of a site relative to other sites, not an intrinsic characteristic of the site (as, for example, word count). Since it was impossible for us to feed our algorithm with all sites ranked for a given search term, we reformulated the problem.

We realized that, in terms of Google ranking, what matters most is whether a given site ends up on the first page for a given search term. Thus, we re-framed the problem: What if we try to predict whether a site will end up in the top 10 sites ranked by Google for a certain search term? We chose top 10 because, as they say, you can hide a dead body on page two!

From that standpoint, the problem turns into a binary (yes or no) classification problem, where we have only two classes: a) the site is a top 10 site, or b) the site is not a top 10 site. Furthermore, instead of making a binary prediction, we decided to predict the probability that a given site belongs to each class.

Later, to force ourselves to make a clear-cut decision, we decided on a threshold above which we predict that a site will be top 10. For example, if we predict that the threshold is 0.85, then if we predict that the probability of a site being in the top 10 is higher than 0.85, we go ahead and predict that the site will be in the top 10.

To measure the performance of the algorithm, we decided to use a confusion matrix.

The following chart provides an overview of the entire process.

Image visually showing our machine learning process

Cleaning the data

We used a data set of 200,000 records, including roughly 2,000 different keywords/search terms.

In general, we can group the attributes we used into three categories:

  • Numerical features
  • Categorical variables
  • Text features

Numerical features are those that can take on any number within an infinite or finite interval. Some of the numerical features we used are ease of read, grade level, text length, average number of words per sentence, URL length, website load time, number of domains referring to website, number of .edu domains referring to website, number of .gov domains referring to website, Trust Flow for a number of topics, Citation Flow, Facebook shares, LinkedIn shares and Google shares. We applied a standard scalar (multiplier) to these features to center them around the mean, but other than that, they require no further preprocessing.

categorical variable is one which can take on a limited number of values, with each value representing a different group or category. The categorical variables we used include most frequent keywords, as well as locations and organizations throughout the site, in addition to topics for which the website is trusted. Preprocessing for these features included turning them into numerical labels and subsequent one-hot encoding.

Text features are obviously composed of text. They include search term, website content, title, meta-description, anchor text, headers (H3, H2, H1) and others.

It is important to highlight that there is not a clear-cut difference between some categorical attributes (e.g., organizations mentioned on the site) and text, and some attributes indeed switched from one category to the other in different models.

Feature engineering

We engineered additional features, which have correlation with rank.

Most of these features are Boolean (true or false), but some are numerical. An example of a Boolean feature is the exact search term included on the website text, whereas a numerical feature would be how many of the tokens in the search term are included in the website text.

Below are some of the features we engineered.

Image showing boolean and quantitative features that were engineered

Run TF-IDF

To pre-process the text features, we used the TF-IDF algorithm (term-frequency, inverse document frequency). This algorithm views every instance as a document and the entire set of instances as a corpus. Then, it assigns a score to each term, where the more frequent the term is in the document and the less frequent it is in the corpus, the higher the score.

We tried two TF-IDF approaches, with slightly different results depending on the model. The first approach consisted of concatenating all the text features first and then applying the TF-IDF algorithm (i.e., the concatenation of all text columns of a single instance becomes the document, and the set of all such instances becomes the corpus). The second approach consisted of applying the TF-IDF algorithm separately to each feature (i.e., every individual column is a corpus), and then concatenating the resulting arrays.

The resulting array after TF-IDF is very sparse (most columns for a given instance are zero), so we applied dimensionality reduction (single value decomposition) to reduce the number of attributes/columns.

The final step was to concatenate all resulting columns from all feature categories into an array. This we did after applying all the steps above (cleaning the features, turning the categorical features into labels and performing one-hot encoding on the labels, applying TF-IDF to the text features and scaling all the features to center them around the mean).

Models and ensembles

Having obtained and concatenated all the features, we ran a number of different algorithms on them. The algorithms that showed the most promise are gradient boosting classifier, ridge classifier and a two-layer neural network.

Finally, we assembled the model results using simple averages, and thus we saw some additional gains as different models tend to have different biases.

Optimizing the threshold

The last step was to decide on a threshold to turn probability estimations into binary predictions (“yes, we predict this site will be top 10 in Google” or “no, we predict this site will not be top 10 in Google”). For that, we optimized a cross-validation set and then used the obtained threshold on a test set.

Results

The metric we thought would be the most representative to measure the efficacy of the model is a confusion matrix. A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known.

I am sure you have heard the saying that “a broken clock is right twice a day.” With 100 results for every keyword, a random guess would correctly predict “not in top 10” 90 percent of the time. The confusion matrix ensures the accuracy of both positive and negative answers. We obtained roughly a 41-percent true positive and 41-percent true negative in our best model.

Image showing confusion matrix of our best model

Another way of visualizing the effectiveness of the model is by using an ROC curve. An ROC Curveis “a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.” The non-linear models used in the ensemble were XGBoost and a neural network. The linear model was logistic regression. The ensemble plot indicated a combination of the linear and non-linear models.

Image of ROC curve generated by our model

XGBoost is short for “Extreme Gradient Boosting,” with gradient boosting being “a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.”

The chart below shows the relative contribution of the feature categories to the accuracy of the final prediction of this model. Unlike neural networks, XGBoost, along with certain other models, allow you to easily peek into the model to tell the relative predictive weight that particular features hold.

Graph of predictive importance by feature category

We were quite impressed that we were able to build a model that showed predictive power from the features that we had given it. We were very nervous that our limitation of features would lead to the utter fruitlessness of this project. Ideally, we would have a way to crawl an entire site to gain overall relevance. Perhaps we could gather data on the number of Google reviews a business had. We also understood that Google has much better data on links and citations than we could ever hope to gather.

What we learned

Machine learning is a very powerful tool that can be used even if you do not understand fully the complexity of how it works. I have read many articles about RankBrain and the inability of engineers to understand how it works. This is part of the magic and beauty of machine learning. Similar to the process of evolution, in which life gains different features and some live and some die, the process of machine learning finds the way to the answer instead of being given it.

While we were happy with the results of our first models, it is important to understand that this was trained on a relatively small sample compared to the immense size of the internet. One of the key goals in building any kind of machine learning tool is the idea of generalization and operating effectively on data that has never been seen before. We are currently testing our model on new queries and will continue to refine.

The largest takeaway for me in this project was just starting to get a grasp on the immense value that machine learning has for our industry. A few of the ways I see it impacting SEO are:

  • Text generation, summarization and categorization. Think about smart excerpts for content and websites that potentially self-organize based on classification.
  • Never having to write another ALT parameter (See below).
  • New ways of looking at user behavior and classification/scoring of visitors.
  • Integration of new ways of navigating websites using speech and smart Q&A style content/product/recommendation systems.
  • Entirely new ways of mining analytics and crawled data to give insights into visitors, sessions, trends and potentially visibility.
  • Much smarter tools in distribution of ad channels to relevant users.

This project was more about learning for us rather than accomplishing a holy grail (of sorts). Much like the advice I give to new developers (“the best learning happens while doing”), it is important to get your hands dirty and start training. You will learn to gather, clean and organize data, and you’ll familiarize yourself with the ins and outs of various machine learning tools.

Much of this is familiar to more technical SEOs, but the industry also is developing tools to help those who are not as technically inclined. I have compiled a few resources below that are of interest in understanding this space.

Recent technologies of interest

It is important to understand that the gross majority of machine learning is not about building a human-level AI, but rather about using data to solve real problems. Below are a few examples of recent ways this is happening.

NeuralTalk2

NeuralTalk2 is a Torch model by Andrej Karpathy for generating natural language descriptions of given images. Imagine never having to write another ALT parameter again and having a machine do it for you. Facebook is already incorporating this technology.

Microsoft Bots and Alexa

Researchers are mastering speech processing and are starting to be able to understand the meaning behind words (given their context). This has deep implications to traditional websites in how information is accessed. Instead of navigation and search, the website could have aconversation with your visitors. In the instance of Alexa, there is no website at all, just the conversation.

Natural language processing

There is a tremendous amount of work going on right now in the realm of translation and content semantics. It goes far beyond traditional Markov chains and n-gram representations of text. Machines are showing the initial hints of abilities to summarize and generate text across domains. “The Unreasonable Effectiveness of Recurrent Neural Networks” is a great post from last year that gives a glimpse of what is possible here.

Home Depot search relevance competition

Home Depot recently sponsored an open competition on Kaggle to predict the relevance of their search results to the visitor’s query. You can see some of the process behind the winning entries onthis thread.

How to get started with machine learning

Because we, as search marketers, live in a world of data, it is important for us to understand new technologies that allow us to make better decisions in our work. There are many places where machine learning can help our understanding, from better knowing the intent of our users to which site behaviors drive which actions.

For those of you who are interested in machine learning but are overwhelmed with the complexity, I would recommend Data Science Dojo. There are simple tutorials using Microsoft’s Machine Learning Studio that are very approachable to newbies. This also means that you do not have to learn to code prior to building your first models.

If you are interested in more powerful customized models and are not afraid of a bit of code, I would probably start with listening to this lecture by Justin Johnson at Stanford, as it goes through the four most common libraries. A good understanding of Python (and perhaps R) is necessary to do any work of merit. Christopher Olah has a pretty great blog that covers a lot of interesting topics involving data science.

Finally, Github is your friend. I find myself looking through recent repos added to see the incredibly interesting projects people are working on. In many cases, data is readily available, and there are pretrained models that perform certain tasks very well. Looking around and becoming familiar with the possibilities will give you some perspective into this amazing field.

http://searchengineland.com/experiment-trying-predict-google-rankings-253621

Categorized in Search Engine

Nearly a year ago, Google expanded their search engine to begin instantly answering questions, such as the death of a celebrity or a math problem. The result was a reaction to the true nature of search; nobody was writing something into Google without actively seeking an answer.

That answer may be finding a particular piece or a few different pieces of content, or simply a particular website, but you are asking the question; "where is this thing I want?" The instant responsiveness of Google and its ability to query an entire database of the Internet has made other sites take notice.

That's why sites like Periscope, Medium, Vevo and Hacker News have adopted Algolia's hosted cloud search platform, an API that brings Google's instant to near-instant search capabilities to their sites. The result is that their content is immediately searchable and relevant, so that if a user makes a complex query and/or a typo, they will still receive results that make sense for what they're looking for. "By leveraging the trove of internal data that websites or mobile apps have, we are helping them to deliver an experience that is even deeper and more personalized than what Google does for web searches," said Nicolas Dessaigne, CEO of Algolia. "Our goal is to make search seamless, nearly invisible.

Today we can deliver relevant results at the first keystroke. In the future, all results will be personalized and delivered before the question is even completely formulated." This is an important approach for businesses large and small to take, and closes in on AI; Algolia's technology works to not just index and search your data, but also make sure that it produces the right answer to a query.

This is an interesting comparison to the ever-growing world of the Internet of Things, led by Amazon's Echo. Users, despite their accents, stuttering or other things that make a question "imperfect" are still able to get an answer. Algolia, their competitor Elastic and Google all recognize this, with Algolia in particular even advertising directly on their website that you should try a test search with a typo, to show how the platform can answer the question regardless. Google will even go as far as to suggest what you may be trying to type, if not bringing you the exact answer despite your mistake.

As Quartz's Leo Mirani said, there are over 10 trillion web pages to index, including but not limited to the masses of social media services providing terabytes if not petabytes of information into said stream. This is the same problem that many startups and companies will begin to find, both from the angle of big data overload and the expectations of the user.

The instantaneous nature of search may make users unlikely to even browse the same way, as we move away from the original web's exploratory nature to people visiting each website with a purpose. In the same Quartz article, Mirani speaks to author Stefan Weitz, who wrote the book Search: How The Data Explosion Makes Us Faster, where Weitz argues that search must mature to mirror human nature, and be ready to answer a query at speed.

 "We must think of search as the omniscient watcher in the sky, aware of everything this happening on the ground below," said Weitz. "For this to happen, search itself needs to be deconstructed into its component tasks: indexing and understanding the world and everything in it; reading senses, so search systems can see and hear (and eventually smell and touch!) and interact with us in more natural ways; and communicating with us humans in contextually appropriate ways, whether that's in text, in speech, or simply by talking to other machines on our behalf to make things happen in the real world."

To Algolia's Dessaigne, this approach is a natural course. "Personalization of results is also going to be an important trend for websites and apps, particularly among big retailers and media websites. Along this progression, voice interfaces are going to gain traction. We are still far from truly conversational interfaces, but we'll eventually get there."

While we all dream of a day when we can have an answer as we speak, or even think of the question, we are far away from it. Nevertheless, startups are clearly ready to make the jump for us. We're in a world that's far from the days when having a search bar was a quirky feature; users have a question and to succeed in business, you'll need to have an answer.

Source:  http://www.inc.com/amy-cuddy/3-body-language-books-that-all-leaders-should-read-this-summer.html

Categorized in Search Engine

Over the past year, Google engineers have experimented and developed a set of building blocks for the Internet of Things - an ecosystem of connected devices, services and “things” that promises direct and efficient support of one’s daily life. While there has been significant progress in this field, there remain significant challenges in terms of (1) interoperability and a standardized modular systems architecture, (2) privacy, security and user safety, as well as (3) how users interact with, manage and control an ensemble of devices in this connected environment.

It is in this context that we are happy to invite university researchers1 to participate in the Internet of Things (IoT) Technology Research Award Pilot. This pilot provides selected researchers in-kind gifts of Google IoT related technologies (listed below), with the goal of fostering collaboration with the academic community on small-scale (~4-8 week) experiments, discovering what they can do with our software and devices.

We invite you to submit proposals in which Google IoT technologies are used to (1) explore interesting use cases and innovative user interfaces, (2) address technical challenges as well as interoperability between devices and applications, or (3) experiment with new approaches to privacy, safety and security. Proposed projects should make use of one or a combination of these Google technologies:

Google beacon platform - consisting of the open beacon format Eddystone and various client and cloud APIs, this platform allows developers to mark up the world to make your apps and devices work smarter by providing timely, contextual information.

Physical Web - based on the Eddystone URL beacon format, the Physical Web is an approach designed to allow any smart device to interact with real world objects - a vending machine, a poster, a toy, a bus stop, a rental car - and not have to download an app first.

Nearby Messages API - a publish-subscribe API that lets you pass small binary payloads between internet-connected Android and iOS devices as well as with beacons registered with Google's proximity beacon service.

Brillo & Weave - Brillo is an Android-based embedded OS that brings the simplicity and speed of mobile software development to IoT hardware to make it cost-effective to build a secure smart device, and to keep it updated over time. Weave is an open communications and interoperability platform for IoT devices that allows for easy connections to networks, smartphones (both Android and iOS), mobile apps, cloud services, and other smart devices.

OnHub router - a communication hub for the Internet of Things supporting Bluetooth® Smart Ready, 802.15.4 and 802.11a/b/g/n/ac. It also allows you to quickly create a guest network and control the devices you want to share (see On.Here).

Google Cloud Platform IoT Solutions - tools to scale connections, gather and make sense of data, and provide the reliable customer experiences that IoT hardware devices require.
Chrome Boxes & Kiosk Apps - provides custom full screen apps for a purpose-built Chrome device, such as a guest registration desk, a library catalog station, or a point-of-sale system in a store.

Vanadium - an open-source framework designed to make it easier to develop secure, multi-device user experiences, with or without an Internet connection.

Check out the Ubiquity Dev Summit playlist for more information on these platforms and their best practices.

Please submit your proposal here by February 29th in order to be considered for a award. Proposals will be reviewed by researchers and product teams within Google. In addition to looking for impact and interesting ideas, priority will be given to research that can make immediate use of the available technologies. Selected proposals will be notified by the end of March 2016. If selected, the award will be subject to Google’s terms, and your use of Google technologies will be subject to the applicable Google terms of service.

To connect our physical world to the Internet is a broad and long-term challenge, one we hope to address by working with researchers across many disciplines and work practices. We are looking forward to the collaborative opportunity provided by this pilot, and learning about innovative applications you create for these new technologies.

Source:  http://googleresearch.blogspot.com/2016/02/announcing-google-internet-of-things.html

Categorized in Internet of Things

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.
Please wait

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Newsletter Subscription

Receive Great tips via email, enter your email to Subscribe.
Please wait

Follow Us on Social Media

Book Your Seat for Webinar GET FREE REGISTRATION FOR MEMBERS ONLY      Register Now