fbpx

More than 484,000 Google keyword searches a month from around the world, including at least 54,000 searches in the UK, return results dominated by Islamist extremist material, a report into the online presence of jihadism has revealed.

The study found that of the extremist content accessible through these specific keyword searches, 44% was explicitly violent, 36% was non-violent and 20% was political Islamist in content, the last being non-violent but disseminated by known Islamist groups with political ambitions.

The study is one of the first to expose the role of the search engine rather than social media in drawing people to extremist jihadi material on the web. It argues the role of the search engine – a field dominated by Google – has been a blind spot that has been missed by those seeking to measure and counter extremist messages on the internet.

Although the UK government’s Prevent strategy claims the internet must not be ungoverned space for Islamist extremism and British diplomats have taken the lead in the global communications fight against Islamic State on the net, the study suggests government agencies are only at the beginning of a “labyrinthine challenge”. So-called counter-narrative initiatives led by governments and civil society groups are “under-resourced and not achieving sufficient natural interest”, suggesting the battle of ideas is not even being engaged, let alone won.

The study, undertaken jointly by Digitalis and the Centre on Religion and Geopolitics, will be challenged by those who claim it advocates censorship, has blurred the lines between political Islam and violent extremism and cannot validly quantify the presence of extremism.

But the findings come in a week in which there has been a spate of terrorist attacks in Germany and France, some undertaken by young people either radicalised on the internet, or using it to feed their obsession with violence. Many of the jihadist foreign fighters in Syria were radicalised online as “the search engine gradually overtakes the library and the classroom as a source of information”.

The study, entitled A War of Keywords: how extremists are exploiting the internet and what to do about it, argues “many of the legitimate mainstream Islamic scholarly websites host extremist material, including jihadi material, often without any warning or safeguards in place”.

It also argues non-violent Islamist organisations, such as Hizb ut-Tahrir, have a very strong online presence and dominate the results for some keyword searches. Some of the most popular search words used were crusader, martyr, kafir (non-believer), khilafa (a pan-Islamic state) or apostate.

In a condemnation of government efforts it finds very little of this content is challenged online. Analysing 47 relevant keywords, the search-engine analysis found counter-narrative content outperformed extremist content in only 11% of the results generated. For the search term khilafah, which has 10,000 global monthly searches, the ratio of extremist content to counter-narrative is nine to one.

This is partly because counter-narrative sites lack search engine optimisation so they do not rank high enough in searches, By contrast, Khilafa.com, the English website of Hizb ut-Tahrir, had more than 100,000 links into it.

The study also warns some of the most-used Muslim websites such as Kalmullah.com and WorldofIslam.info “host traditional Islamic content alongside extremist material” so are knowingly or unknowingly abusing the trust of their readers.

The study also claims a user can come across extremist content relatively easily while browsing for Islamic literature. Few effective restrictions apply to accessing Islamic State English-language magazine Dabiq or Inspire magazine, which is linked to al-Qaeda in the Arabian peninsula. Both are readily available to browse and download through clearing sites.

The study produced its headline numbers by looking at the average monthly number of global searches conducted in Google for 287 extremist-related keywords – 143 in English and 144 in Arabic. It then looked at two samples totalling 47 keywords, the first sample focused on the most-used words and the second sample on the keywords deemed to be most extremist. The research then analysed the first two pages thrown up by the search for these keywords.

The authors acknowledge the difficulties technology companies face in policing the results of their search engines. Google is responsible for 40,000 searches a second, 2.5 billion a day and 1.2 trillion a year worldwide. Facebook boasts more than one and a half billion users who create 5 billion likes a day.

Dave King, chief executive of Digitalis, argues: “While the company’s advertising model is based on automatically mining the content its users create, their ability to distinguish a single credible kill threat from the plethora who have threatened to kill in jest is highly limited.”

The study recommends governments, the United Nations, technology companies, civil society groups and religious organisations together establish a charter setting out a common definition of extremism and pledge to make the internet a safer place.

Technology companies, the report says, could work with governments to shift the balance of the online space, as well as share analytical data and trending information to bolster counter-efforts. It suggests search engine companies have been reluctant to or unable to alter the search algorithms that are responsible for search page rankings.

The authors also call for a debate on “the murky dividing line between violent and non-violent extremist material online”, arguing such legal definitions have been achieved over “copyrighted material, child pornography and hate speech all of which have been subject to removal requests.”

Exiisting content control software that prevents access to graphic or age-restricted material could be used and warning signals put on sites.

A Google spokesperson said: “We take this issue very seriously and have processes in place for removing illegal content from all our platforms, including search. We are committed to showing leadership in this area – and have been hosting counterspeech events across the globe for several years. We are also working with organisations around the world on how best to promote their work on counter-radicalisation online.”

https://www.theguardian.com/technology/2016/jul/28/search-engines-role-in-radicalisation-must-be-challenged-finds-study

Categorized in Search Engine

The Digital Payments 2020 report by Google and BCG analyses the transformation in Digital Payments and its impact on the payment landscape in India.

Why digital payments are on the rise?

66% users like the convenience 48% users are lured by offers 75% merchants feel opting for digital payment will increase sales

What are the hurdles on the way?

50% users find it difficult to understand 50% users stopped using it because it is not accepted everywhere

By 2020

60% of digital payments value will be driven by physical points of sale 50% of person to merchant transactions will be worth less than Rs 100

http://retail.economictimes.indiatimes.com/news/e-commerce/e-tailing/more-than-50-of-indias-internet-users-will-use-digital-payments-by-2020-google-and-bcg-report/53483942

Categorized in Search Engine

What are business attributes, and why should local businesses care? Columnist Adam Dorfman explores.

When checking into places on Google Maps, you may have noticed that Google prompts you to volunteer information about the place you’re visiting. For instance, if you check into a restaurant, you might be asked whether the establishment has a wheelchair-accessible entrance or whether the location offers takeout. There’s a reason Google wants to know: attributes.

Attributes consist of descriptive content such as the services a business provides, payment methods accepted or the availability of free parking — details that may not apply to all businesses. Attributes are important because they can influence someone’s decision to visit you.

Google wants to set itself up as a go-to destination of rich, descriptive content about locations, which is why it crowdsources business attributes. But it’s not the only publisher doing so. For instance, if you publish a review on TripAdvisor or Yelp, you’ll be asked a similar battery of questions but with more details, such as whether the restaurant is appropriate for kids, allows dogs, has televisions or accepts bitcoins.

Many of these publishers are incentivizing this via programs like Google’s Local Guides, TripAdvisor’s Badge Collections, and Yelp’s Elite Squad because having complete, accurate information about locations makes each publisher more useful. And being more useful means attracting more visitors, which makes each publisher more valuable.

android crowdsource
   

It’s important that businesses manage their attributes as precious location data assets, if for no other reason than that publishers are doing so. I call publishers (and aggregators who share information with them) data amplifiers because they amplify a business’s data across all the places where people conduct local searches. If you want people to find your business and turn their searches into actual in-store visits, you need to share your data, including detailed attributes, with the major data amplifiers.

Many businesses believe their principal location data challenge is ensuring that their foundational data, such as their names, addresses and phone numbers, are accurate. I call the foundational data “identities,” and indeed, you need accurate foundational data to even be considered when people search for businesses. But as important as they are — and challenging to manage — identities solve for only one-half of the search challenge. Identities ensure visibility, but you need attributes to turn searches into business for your brand.

Attributes are not new, but they’ve become more important because of the way mobile is rapidly accelerating the purchase decision. According to seminal research published by Google, mobile has given rise to “micro-moments,” or times when consumers use mobile devices to make quick decisions about what to do, where to go or what to buy.

Google noted that the number of “near me” searches (searches conducted for goods and services nearby) have increased 146 percent year over year, and 88 percent of these “near me” searches are conducted on mobile devices. As Google’s Matt Lawson wrote:

With a world of information at their fingertips, consumers have heightened expectations for immediacy and relevance. They want what they want when they want it. They’re confident they can make well-informed choices whenever needs arise. It’s essential that brands be there in these moments that matter — when people are actively looking to learn, discover, and/or buy.

Attributes encourage “next moments,” or the action that occurs after someone has found you during a micro-moment. Google understands that businesses failing to manage their attributes correctly will drop off the consideration set when consumers experience micro-moments. For this reason, Google prompts users to complete attributes about businesses when they check into a location on Google Maps.

At the 2016 Worldwide Developers Conference, Apple underscored the importance of attributes when the company rolled out a smarter, more connected Siri that makes it possible for users to create “next moments” faster by issuing voice commands such as “Siri, find some new Italian restaurants in Chicago, book me dinner, and get me an Uber to the restaurant.” In effect, Siri is a more efficient tool for enabling next moments, but only for businesses that manage the attributes effectively.

And with its recently released Google My Business API update to version 3.0, Google also gave businesses that manage offline locations a powerful competitive weapon: the ability to manage attributes directly. By making it possible to share attributes on your Google My Business page, Google has not only amplified its own role as a crucial publisher of attributes but has also given businesses an important tool to take control of your own destiny. It’s your move now.

http://searchengineland.com/google-mining-local-business-attributes-252283

Categorized in Business Research

Google has made another small acquisition to help it continue building out its latest efforts in social apps. The search and Android giant has hired the team behind Kifi, a startup that was building extensions to collect and search links shared in social apps, as well as provide recommendations for further links — such as this tool, Kifi for Twitter. Terms of the deal are not being disclosed, but, according to Google engineering director Eddie Kessler, the app’s team will be joining the company to work on Spaces, Google’s group chat app.

Google tells me it is not commenting on the exact number of people joining.

It looks like Spaces could use the help. The app launched earlier this year and has had a very lukewarm run in the market so far, currently lingering around 577 in the U.S. iOS App Store and 284 in the U.S. Android store, according to stats from App Annie.

This is essentially an acqui-hire. In a Medium post earlier today, Kifi noted that the app is not coming to Google. It will only remain alive for another few weeks, after which point it will stick around for a few weeks more for data exports only.

While the app is not living on, it sounds like the kind of tech that Kifi’s team — co-founded by Dan Blumenfeld and Eishay Smith (although Blumenfeld left the company some time ago) — will continue. Considering Space’s current focus on group chat, it sounds like this means they could tweak Kifi’s link sharing and link recommendation technology to use them in that context, and to be able to collate them with links from other applications and platforms.

This seems to be what Kessler says will be the intention, too, in his own short Google+ post: “Delighted the Kifi team, with their great expertise in organizing shared content and conversations, is joining the Spaces team to build features that improve group sharing.”

Google has disclosed nearly 200 acquisitions to date. Among them, other recent M&A moves that point to Google building up its talent in areas like social and apps include Pie (a Slack-like app) in Singapore and Moodstocks in Paris (to improve image recognition in apps).

Kifi had raised just over $11 million in funding from Don Katz, Oren Zeev, SGVC and Wicklow Capital.

https://techcrunch.com/2016/07/12/google-acquires-deep-search-engine-kifi-to-enhance-its-spaces-group-chat-app/

Categorized in Search Engine

In late 2015, JR Oakes and his colleagues undertook an experiment to attempt to predict Google ranking for a given webpage using machine learning. What follows are their findings, which they wanted to share with the SEO community.

Machine learning is quickly becoming an indispensable tool for many large companies. Everyone has, for sure, heard about Google’s AI algorithm beating the World Champion in Go, as well as technologies like RankBrain, but machine learning does not have to be a mystical subject relegated to the domain of math researchers. There are many approachable libraries and technologies that show promise of being very useful to any industry that has data to play with.

Machine learning also has the ability to turn traditional website marketing and SEO on its head. Late last year, my colleagues and I (rather naively) began an experiment in which we threw several popular machine learning algorithms at the task of predicting ranking in Google. We ended up with an assembly that achieved 41 percent true positive and 41 percent true negative on our data set.

In the following paragraphs, I will take you through our experiment, and I will also discuss a few important libraries and technologies that are important for SEOs to begin understanding.

Our experiment

Toward the end of 2015, we started hearing more and more about machine learning and its promise to make use of large amounts of data. The more we dug in, the more technical it became, and it quickly became clear that it would be helpful to have someone help us navigate this world.

About that time, we came across a brilliant data scientist from Brazil named Alejandro Simkievich. The interesting thing to us about Simkievich was that he was working in the area of search relevance and conversion rate optimization (CRO) and placing very well for important Kaggle competitions. (For those of you not familiar, Kaggle is a website that hosts machine learning competitions for groups of data scientists and machine learning enthusiasts.)

Simkievich is the owner of Statec, a data science/machine learning consulting company, with clients in the consumer goods, automotive, marketing and internet sectors. Lots of Statec’s work had been focused on assessing the relevance of e-commerce search engines. Working together seemed a natural fit, since we are obsessed with using data to help with decision-making for SEO.

We like to set big hairy goals, so we decided to see if we could use the data available from scraping, rank trackers, link tools and a few more tools, to see if we could create features that would allow us to predict the rank of a webpage. While we knew going in that the likelihood of pulling it off was very low, we still pushed ahead for the opportunity for an amazing win, as well as the chance to learn some really interesting technology.

The data

Fundamentally, machine learning is using computer programs to take data and transform it in a way that provides something valuable in return. “Transform” is a very loosely applied word, in that it doesn’t quite do justice to all that is involved, but it was selected for the ease of understanding. The point here is that all machine learning begins with some type of input data.

(Note: There are many tutorials and courses freely available that do a very good job of covering the basics of machine learning, so we will not do that here. If you are interested in learning more, Andrew Ng has an excellent free class on Coursera here.)

The bottom line is that we had to find data that we could use to train a machine learning model. At this point, we didn’t know exactly what would be useful, so we used a kitchen-sink approach and grabbed as many features as we could think of. GetStat and Majestic were invaluable in supplying much of the base data, and we built a crawler to capture everything else.

Image of data used for analysis

Our goal was to end up with enough data to successfully train a model (more on this later), and this meant a lot of data. For the first model, we had about 200,000 observations (rows) and 54 attributes (columns).

A little background

As I said before, I am not going to go into a lot of detail about machine learning, but it is important to grasp a few points to understand the next section. In total, much of the machine learning work today deals with regression, classification and clustering algorithms. I will define the first two here, as they were relevant to our project.

Image showing the difference between classification and regression algorithms

  • Regression algorithms are normally useful for predicting a single number. If you needed to create an algorithm that predicted a stock price based on features of stocks, you would select this type of model. These are called continuous variables.
  • Classification algorithms are used to predict a member of a class of possible answers. This could be a simple “yes or no” classification, or “red, green or blue.” If you needed to predict whether an unknown person was male or female from features, you would select this type of model. These are called discrete variables.

Machine learning is a very technical space right now, and much of the cutting-edge work requires familiarity with linear algebra, calculus, mathematical notation and programming languages like Python. One of the items that helped me understand the overall flow at an approachable level, though, was to think of machine learning models as applying weights to the features in the data you give it. The more important the feature, the stronger the weight.

When you read about “training models,” it is helpful to visualize a string connected through the model to each weight, and as the model makes a guess, a cost function is used to tell you how wrong the guess was and to gently, or sternly, pull the string in the direction of the right answer, correcting all the weights.

The part below gets a bit technical with terminology, so if it is too much for you, feel free to skip to the results and takeaways in the final section.

Tackling Google rankings

Now that we had the data, we tried several approaches to the problem of predicting the Google ranking of each webpage.

Initially, we used a regression algorithm. That is, we sought to predict the exact ranking of a site for a given search term (e.g., a site will rank X for search term Y), but after a few weeks, we realized that the task was too difficult. First, a ranking is by definition a characteristic of a site relative to other sites, not an intrinsic characteristic of the site (as, for example, word count). Since it was impossible for us to feed our algorithm with all sites ranked for a given search term, we reformulated the problem.

We realized that, in terms of Google ranking, what matters most is whether a given site ends up on the first page for a given search term. Thus, we re-framed the problem: What if we try to predict whether a site will end up in the top 10 sites ranked by Google for a certain search term? We chose top 10 because, as they say, you can hide a dead body on page two!

From that standpoint, the problem turns into a binary (yes or no) classification problem, where we have only two classes: a) the site is a top 10 site, or b) the site is not a top 10 site. Furthermore, instead of making a binary prediction, we decided to predict the probability that a given site belongs to each class.

Later, to force ourselves to make a clear-cut decision, we decided on a threshold above which we predict that a site will be top 10. For example, if we predict that the threshold is 0.85, then if we predict that the probability of a site being in the top 10 is higher than 0.85, we go ahead and predict that the site will be in the top 10.

To measure the performance of the algorithm, we decided to use a confusion matrix.

The following chart provides an overview of the entire process.

Image visually showing our machine learning process

Cleaning the data

We used a data set of 200,000 records, including roughly 2,000 different keywords/search terms.

In general, we can group the attributes we used into three categories:

  • Numerical features
  • Categorical variables
  • Text features

Numerical features are those that can take on any number within an infinite or finite interval. Some of the numerical features we used are ease of read, grade level, text length, average number of words per sentence, URL length, website load time, number of domains referring to website, number of .edu domains referring to website, number of .gov domains referring to website, Trust Flow for a number of topics, Citation Flow, Facebook shares, LinkedIn shares and Google shares. We applied a standard scalar (multiplier) to these features to center them around the mean, but other than that, they require no further preprocessing.

categorical variable is one which can take on a limited number of values, with each value representing a different group or category. The categorical variables we used include most frequent keywords, as well as locations and organizations throughout the site, in addition to topics for which the website is trusted. Preprocessing for these features included turning them into numerical labels and subsequent one-hot encoding.

Text features are obviously composed of text. They include search term, website content, title, meta-description, anchor text, headers (H3, H2, H1) and others.

It is important to highlight that there is not a clear-cut difference between some categorical attributes (e.g., organizations mentioned on the site) and text, and some attributes indeed switched from one category to the other in different models.

Feature engineering

We engineered additional features, which have correlation with rank.

Most of these features are Boolean (true or false), but some are numerical. An example of a Boolean feature is the exact search term included on the website text, whereas a numerical feature would be how many of the tokens in the search term are included in the website text.

Below are some of the features we engineered.

Image showing boolean and quantitative features that were engineered

Run TF-IDF

To pre-process the text features, we used the TF-IDF algorithm (term-frequency, inverse document frequency). This algorithm views every instance as a document and the entire set of instances as a corpus. Then, it assigns a score to each term, where the more frequent the term is in the document and the less frequent it is in the corpus, the higher the score.

We tried two TF-IDF approaches, with slightly different results depending on the model. The first approach consisted of concatenating all the text features first and then applying the TF-IDF algorithm (i.e., the concatenation of all text columns of a single instance becomes the document, and the set of all such instances becomes the corpus). The second approach consisted of applying the TF-IDF algorithm separately to each feature (i.e., every individual column is a corpus), and then concatenating the resulting arrays.

The resulting array after TF-IDF is very sparse (most columns for a given instance are zero), so we applied dimensionality reduction (single value decomposition) to reduce the number of attributes/columns.

The final step was to concatenate all resulting columns from all feature categories into an array. This we did after applying all the steps above (cleaning the features, turning the categorical features into labels and performing one-hot encoding on the labels, applying TF-IDF to the text features and scaling all the features to center them around the mean).

Models and ensembles

Having obtained and concatenated all the features, we ran a number of different algorithms on them. The algorithms that showed the most promise are gradient boosting classifier, ridge classifier and a two-layer neural network.

Finally, we assembled the model results using simple averages, and thus we saw some additional gains as different models tend to have different biases.

Optimizing the threshold

The last step was to decide on a threshold to turn probability estimations into binary predictions (“yes, we predict this site will be top 10 in Google” or “no, we predict this site will not be top 10 in Google”). For that, we optimized a cross-validation set and then used the obtained threshold on a test set.

Results

The metric we thought would be the most representative to measure the efficacy of the model is a confusion matrix. A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known.

I am sure you have heard the saying that “a broken clock is right twice a day.” With 100 results for every keyword, a random guess would correctly predict “not in top 10” 90 percent of the time. The confusion matrix ensures the accuracy of both positive and negative answers. We obtained roughly a 41-percent true positive and 41-percent true negative in our best model.

Image showing confusion matrix of our best model

Another way of visualizing the effectiveness of the model is by using an ROC curve. An ROC Curveis “a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.” The non-linear models used in the ensemble were XGBoost and a neural network. The linear model was logistic regression. The ensemble plot indicated a combination of the linear and non-linear models.

Image of ROC curve generated by our model

XGBoost is short for “Extreme Gradient Boosting,” with gradient boosting being “a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.”

The chart below shows the relative contribution of the feature categories to the accuracy of the final prediction of this model. Unlike neural networks, XGBoost, along with certain other models, allow you to easily peek into the model to tell the relative predictive weight that particular features hold.

Graph of predictive importance by feature category

We were quite impressed that we were able to build a model that showed predictive power from the features that we had given it. We were very nervous that our limitation of features would lead to the utter fruitlessness of this project. Ideally, we would have a way to crawl an entire site to gain overall relevance. Perhaps we could gather data on the number of Google reviews a business had. We also understood that Google has much better data on links and citations than we could ever hope to gather.

What we learned

Machine learning is a very powerful tool that can be used even if you do not understand fully the complexity of how it works. I have read many articles about RankBrain and the inability of engineers to understand how it works. This is part of the magic and beauty of machine learning. Similar to the process of evolution, in which life gains different features and some live and some die, the process of machine learning finds the way to the answer instead of being given it.

While we were happy with the results of our first models, it is important to understand that this was trained on a relatively small sample compared to the immense size of the internet. One of the key goals in building any kind of machine learning tool is the idea of generalization and operating effectively on data that has never been seen before. We are currently testing our model on new queries and will continue to refine.

The largest takeaway for me in this project was just starting to get a grasp on the immense value that machine learning has for our industry. A few of the ways I see it impacting SEO are:

  • Text generation, summarization and categorization. Think about smart excerpts for content and websites that potentially self-organize based on classification.
  • Never having to write another ALT parameter (See below).
  • New ways of looking at user behavior and classification/scoring of visitors.
  • Integration of new ways of navigating websites using speech and smart Q&A style content/product/recommendation systems.
  • Entirely new ways of mining analytics and crawled data to give insights into visitors, sessions, trends and potentially visibility.
  • Much smarter tools in distribution of ad channels to relevant users.

This project was more about learning for us rather than accomplishing a holy grail (of sorts). Much like the advice I give to new developers (“the best learning happens while doing”), it is important to get your hands dirty and start training. You will learn to gather, clean and organize data, and you’ll familiarize yourself with the ins and outs of various machine learning tools.

Much of this is familiar to more technical SEOs, but the industry also is developing tools to help those who are not as technically inclined. I have compiled a few resources below that are of interest in understanding this space.

Recent technologies of interest

It is important to understand that the gross majority of machine learning is not about building a human-level AI, but rather about using data to solve real problems. Below are a few examples of recent ways this is happening.

NeuralTalk2

NeuralTalk2 is a Torch model by Andrej Karpathy for generating natural language descriptions of given images. Imagine never having to write another ALT parameter again and having a machine do it for you. Facebook is already incorporating this technology.

Microsoft Bots and Alexa

Researchers are mastering speech processing and are starting to be able to understand the meaning behind words (given their context). This has deep implications to traditional websites in how information is accessed. Instead of navigation and search, the website could have aconversation with your visitors. In the instance of Alexa, there is no website at all, just the conversation.

Natural language processing

There is a tremendous amount of work going on right now in the realm of translation and content semantics. It goes far beyond traditional Markov chains and n-gram representations of text. Machines are showing the initial hints of abilities to summarize and generate text across domains. “The Unreasonable Effectiveness of Recurrent Neural Networks” is a great post from last year that gives a glimpse of what is possible here.

Home Depot search relevance competition

Home Depot recently sponsored an open competition on Kaggle to predict the relevance of their search results to the visitor’s query. You can see some of the process behind the winning entries onthis thread.

How to get started with machine learning

Because we, as search marketers, live in a world of data, it is important for us to understand new technologies that allow us to make better decisions in our work. There are many places where machine learning can help our understanding, from better knowing the intent of our users to which site behaviors drive which actions.

For those of you who are interested in machine learning but are overwhelmed with the complexity, I would recommend Data Science Dojo. There are simple tutorials using Microsoft’s Machine Learning Studio that are very approachable to newbies. This also means that you do not have to learn to code prior to building your first models.

If you are interested in more powerful customized models and are not afraid of a bit of code, I would probably start with listening to this lecture by Justin Johnson at Stanford, as it goes through the four most common libraries. A good understanding of Python (and perhaps R) is necessary to do any work of merit. Christopher Olah has a pretty great blog that covers a lot of interesting topics involving data science.

Finally, Github is your friend. I find myself looking through recent repos added to see the incredibly interesting projects people are working on. In many cases, data is readily available, and there are pretrained models that perform certain tasks very well. Looking around and becoming familiar with the possibilities will give you some perspective into this amazing field.

http://searchengineland.com/experiment-trying-predict-google-rankings-253621

Categorized in Search Engine

Nearly a year ago, Google expanded their search engine to begin instantly answering questions, such as the death of a celebrity or a math problem. The result was a reaction to the true nature of search; nobody was writing something into Google without actively seeking an answer.

That answer may be finding a particular piece or a few different pieces of content, or simply a particular website, but you are asking the question; "where is this thing I want?" The instant responsiveness of Google and its ability to query an entire database of the Internet has made other sites take notice.

That's why sites like Periscope, Medium, Vevo and Hacker News have adopted Algolia's hosted cloud search platform, an API that brings Google's instant to near-instant search capabilities to their sites. The result is that their content is immediately searchable and relevant, so that if a user makes a complex query and/or a typo, they will still receive results that make sense for what they're looking for. "By leveraging the trove of internal data that websites or mobile apps have, we are helping them to deliver an experience that is even deeper and more personalized than what Google does for web searches," said Nicolas Dessaigne, CEO of Algolia. "Our goal is to make search seamless, nearly invisible.

Today we can deliver relevant results at the first keystroke. In the future, all results will be personalized and delivered before the question is even completely formulated." This is an important approach for businesses large and small to take, and closes in on AI; Algolia's technology works to not just index and search your data, but also make sure that it produces the right answer to a query.

This is an interesting comparison to the ever-growing world of the Internet of Things, led by Amazon's Echo. Users, despite their accents, stuttering or other things that make a question "imperfect" are still able to get an answer. Algolia, their competitor Elastic and Google all recognize this, with Algolia in particular even advertising directly on their website that you should try a test search with a typo, to show how the platform can answer the question regardless. Google will even go as far as to suggest what you may be trying to type, if not bringing you the exact answer despite your mistake.

As Quartz's Leo Mirani said, there are over 10 trillion web pages to index, including but not limited to the masses of social media services providing terabytes if not petabytes of information into said stream. This is the same problem that many startups and companies will begin to find, both from the angle of big data overload and the expectations of the user.

The instantaneous nature of search may make users unlikely to even browse the same way, as we move away from the original web's exploratory nature to people visiting each website with a purpose. In the same Quartz article, Mirani speaks to author Stefan Weitz, who wrote the book Search: How The Data Explosion Makes Us Faster, where Weitz argues that search must mature to mirror human nature, and be ready to answer a query at speed.

 "We must think of search as the omniscient watcher in the sky, aware of everything this happening on the ground below," said Weitz. "For this to happen, search itself needs to be deconstructed into its component tasks: indexing and understanding the world and everything in it; reading senses, so search systems can see and hear (and eventually smell and touch!) and interact with us in more natural ways; and communicating with us humans in contextually appropriate ways, whether that's in text, in speech, or simply by talking to other machines on our behalf to make things happen in the real world."

To Algolia's Dessaigne, this approach is a natural course. "Personalization of results is also going to be an important trend for websites and apps, particularly among big retailers and media websites. Along this progression, voice interfaces are going to gain traction. We are still far from truly conversational interfaces, but we'll eventually get there."

While we all dream of a day when we can have an answer as we speak, or even think of the question, we are far away from it. Nevertheless, startups are clearly ready to make the jump for us. We're in a world that's far from the days when having a search bar was a quirky feature; users have a question and to succeed in business, you'll need to have an answer.

Source:  http://www.inc.com/amy-cuddy/3-body-language-books-that-all-leaders-should-read-this-summer.html

Categorized in Search Engine

Over the past year, Google engineers have experimented and developed a set of building blocks for the Internet of Things - an ecosystem of connected devices, services and “things” that promises direct and efficient support of one’s daily life. While there has been significant progress in this field, there remain significant challenges in terms of (1) interoperability and a standardized modular systems architecture, (2) privacy, security and user safety, as well as (3) how users interact with, manage and control an ensemble of devices in this connected environment.

It is in this context that we are happy to invite university researchers1 to participate in the Internet of Things (IoT) Technology Research Award Pilot. This pilot provides selected researchers in-kind gifts of Google IoT related technologies (listed below), with the goal of fostering collaboration with the academic community on small-scale (~4-8 week) experiments, discovering what they can do with our software and devices.

We invite you to submit proposals in which Google IoT technologies are used to (1) explore interesting use cases and innovative user interfaces, (2) address technical challenges as well as interoperability between devices and applications, or (3) experiment with new approaches to privacy, safety and security. Proposed projects should make use of one or a combination of these Google technologies:

Google beacon platform - consisting of the open beacon format Eddystone and various client and cloud APIs, this platform allows developers to mark up the world to make your apps and devices work smarter by providing timely, contextual information.

Physical Web - based on the Eddystone URL beacon format, the Physical Web is an approach designed to allow any smart device to interact with real world objects - a vending machine, a poster, a toy, a bus stop, a rental car - and not have to download an app first.

Nearby Messages API - a publish-subscribe API that lets you pass small binary payloads between internet-connected Android and iOS devices as well as with beacons registered with Google's proximity beacon service.

Brillo & Weave - Brillo is an Android-based embedded OS that brings the simplicity and speed of mobile software development to IoT hardware to make it cost-effective to build a secure smart device, and to keep it updated over time. Weave is an open communications and interoperability platform for IoT devices that allows for easy connections to networks, smartphones (both Android and iOS), mobile apps, cloud services, and other smart devices.

OnHub router - a communication hub for the Internet of Things supporting Bluetooth® Smart Ready, 802.15.4 and 802.11a/b/g/n/ac. It also allows you to quickly create a guest network and control the devices you want to share (see On.Here).

Google Cloud Platform IoT Solutions - tools to scale connections, gather and make sense of data, and provide the reliable customer experiences that IoT hardware devices require.
Chrome Boxes & Kiosk Apps - provides custom full screen apps for a purpose-built Chrome device, such as a guest registration desk, a library catalog station, or a point-of-sale system in a store.

Vanadium - an open-source framework designed to make it easier to develop secure, multi-device user experiences, with or without an Internet connection.

Check out the Ubiquity Dev Summit playlist for more information on these platforms and their best practices.

Please submit your proposal here by February 29th in order to be considered for a award. Proposals will be reviewed by researchers and product teams within Google. In addition to looking for impact and interesting ideas, priority will be given to research that can make immediate use of the available technologies. Selected proposals will be notified by the end of March 2016. If selected, the award will be subject to Google’s terms, and your use of Google technologies will be subject to the applicable Google terms of service.

To connect our physical world to the Internet is a broad and long-term challenge, one we hope to address by working with researchers across many disciplines and work practices. We are looking forward to the collaborative opportunity provided by this pilot, and learning about innovative applications you create for these new technologies.

Source:  http://googleresearch.blogspot.com/2016/02/announcing-google-internet-of-things.html

Categorized in Internet of Things

The company behind two of the most highly rated smartphones (big and small), the leading smart thermostat, a super high-end laptop, a 2-in-1 tablet, a Wi-Fi camera, streaming audio and video players, a sexy router and a smart smoke detector ... is Google?

 

It's strange but true. Google (GOOGL, Tech30) is synonymous with search and Internet apps, but it has quietly built itself a very respectable gadget business. Google has come a long way since its first Nexus smartphone launched in 2010.

 

The Nest is a top-seller. The Chromecast is a big hit. The Nexus 6P is one of the best-reviewed smartphones ever. And Chromebooks are quickly becoming the standard education laptops for K-12 students.


Google appears unsatisfied, however.


Its portfolio of gizmos is expected to expand at the Google I/O developers conference next week. Google is rumored to be unveiling two brand new gadgets at I/O: A virtual reality headset and an Amazon Echo competitor.


Google's new virtual reality gizmo is expected to be a standalone gadget, running Android (no smartphone required).


VR isn't new to Google, though its current offering is kind of a joke. "Cardboard" is its $15 VR viewer that is literally made out of cardboard, Velcro and plastic lenses. It doesn't do anything on its own: You have to stick your smartphone inside a cardboard flap.


Google's new Echo competitor is expected to be a tall Internet-connected speaker that can play music, read your emails out loud, tell you the weather and do all the tasks that virtual assistants do. Like Amazon's Echo, it will respond to voice commands ("OK Google," not "Alexa"), but it will have Google's giant search engine to pull information from.

 

Why the big gadget push?


Google thrives on data. Its mission, after all, is to "organize the world's information and make it universally accessible and useful."


There's a tremendous amount of information that can be learned from Google's gadgets: how you use energy, connect to the Internet, and what media you stream. By better understanding its customers' behaviors, it can offer ads and services that are tailored to them.


Plus, VR and the Internet of Things are the buzzy, potentially groundbreaking ways we might interact with the Internet in the future. Google wants to ensure it isn't left out. If Amazon, Apple, Microsoft, Facebook or any other competitor beats Google to the punch, Google could lose out on a massive amount of important information about their customers.


And if Google doesn't control the gadgets that you use, there's no guarantee you'll use its services. For example, Android promotes Gmail, YouTube, Google search and Google Maps at launch (something the company is currently being investigated for by the European Union).


By making gizmos and devices that its customers want to use, Google can continue to lock people into its services and searches, collecting their data and serving up more relevant ads.

 

Source:  http://money.cnn.com/2016/05/12/technology/google-gadget/index.html

Categorized in Science & Tech

Has your business listing in Google been suspended? Not sure what happened? Columnist Joy Hawkins discusses the likely causes and how to address them.

I see threads over at the Google My Business forum all the time from panicked business owners or SEOs who have logged into Google My Business to see a big red “Suspended” banner at the top of the page. The Google My Business guidelines have a very long list of things you shouldn’t do, but some offenses are much more serious than others.

Before I get into which rule violations lead to suspensions, it’s important to know the facts around suspensions.

Google won’t tell you why you got suspended

A Google employee will rarely tell you why your account got suspended.

Business owners often want Google to spell out what rule caused their suspension, but Google isn’t about to help rule-breakers get better at doing it and avoid consequences.

There are two different types of suspensions

The first type of suspension is what I refer to as a soft suspension. This is when you log in to Google My Business and see the “suspended” label and no longer have the ability to manage your listing. However, your listing still shows up on Google and Google Maps/Map Maker.

In this case, the listing has really just become unverified. Since you broke Google’s guidelines in some way, they have removed your ability to manage the listing, but the listing’s ranking is rarely impacted. I once worked with a locksmith who ranked first in a major metro area; even after his account got suspended, his ranking didn’t decline.

To fix this type of suspension, all you need to do is create a new Google account, re-verify the listing and stop breaking the rules.

The second type of suspension is what I call a hard suspension. This is very serious and means your entire listing has been removed from Google, including all the reviews and photos. When you pull up the record in Google Map Maker, it will say “removed.”

In this case, your only solution is to get Google to reinstate it; however, the chances of that are slim because this generally only happens when Google has decided the business listing is not eligible to be on Google Maps.

Following are the top nine reasons that Google suspends local listings:

1. Your website field contains a forwarding URL

I dealt with a case last year where I couldn’t figure out why the listing got suspended. Google was able to publicly confirm that it was because the website URL the business was using in Google My Business was actually a vanity URL that forwarded to a different domain.

As per the guidelines, “Do not provide phone numbers or URLs that redirect or ‘refer’ users to landing pages.” This often results in a soft suspension.

2. You are adding extra keywords to your business name field

As per the guidelines:

Adding unnecessary information to your name (e.g., “Google Inc. – Mountain View Corporate Headquarters” instead of “Google”) by including marketing taglines, store codes, special characters, hours or closed/open status, phone numbers, website URLs, service/product information, location/address or directions, or containment information (e.g., “Chase ATM in Duane Reade”) is not permitted.

This often results in a soft suspension, since the business is still eligible to be on Google Maps but just has a different real name.

3. You are a service-area business that didn’t hide your address

According to Google’s guidelines on service-area businesses, you should only show your address if customers show up at your business address. Whenever I’ve seen this, it was a hard suspension, since the listing was not eligible to show up on Google Maps based on the Map Maker guidelines.

It’s extremely vital for a business owner of a service-area business to verify their listing, since Google My Business allows them, but Map Maker does not. This means any non-verified listing that appears on Google Maps for a service-area business can get removed, and the reviews and photos will disappear along with it.

4. You have multiple verified listings for the same business

According to the guidelines: “Do not create more than one page for each location of your business, either in a single account or multiple accounts.”

Google will often suspend both listings (the real one and the duplicate you created) but will un-verify the legit one (soft suspension) and remove the duplicate (hard suspension).

5. Your business type is sensitive or not allowed on Google Plus

This one is new to me, but recently Google suspended (soft suspension) a gun shop and claimed the business type is not allowed on Google Plus. Since every verified listing is automatically on G+, the only option is for them is to have an unverified listing on Google Maps.

According to the Google Plus guidelines, regulated goods are allowed if they set a geographic and age restriction, so the jury is still out on whether Google will reinstate it or not.

6. You created a listing at a virtual office or mailbox

Google states:

If your business rents a temporary, “virtual” office at a different address from your primary business, do not create a page for that location unless it is staffed during your normal business hours.

I often see businesses creating multiple listings at virtual offices because they want to rank in multiple towns and not just the city their office is actually located in. If Google catches them or someone reports it, the listings will get removed (hard suspension).

7. You created a listing for an online business without a physical storefront

The first rule for eligible businesses is that they must make in-person contact with customers. Since online businesses don’t do this, Google specifies that they are supposed to create a G+ brand page instead of a local page, which means they won’t rank in the 3-pack or on Google Maps.

I was once helping out a basket store in Ottawa on the Google My Business forum that creates custom gift baskets that you can order online. When I escalated it to Google to fix something, they unexpectedly removed her listing completely (hard suspension) because she ran an online store.

8. You run a service or class that operates in a building that you don’t own

For example, my church has an AA group that meets there weekly. They would not be eligible for a listing on Google Maps. According to the guidelines, “Ineligible businesses include: an ongoing service, class, or meeting at a location that you don’t own or have the authority to represent.”

9. You didn’t do anything wrong, but the industry you are in is cluttered with spam, so the spam filters are tighter

I commonly see this most often with locksmiths. I have run into several legitimate locksmithswho have had their listings suspended (hard suspensions, usually) because the spam filter accidentally took them down.

In this case, I would always suggest posting on the Google My Business forum so a Top Contributor can escalate the case to Google.

Conclusion

Has your listing been suspended for reasons I didn’t mention? Feel free to reach out to me or post on the forum and share your experience.

Source: http://searchengineland.com/top-9-reasons-google-suspends-local-listings-247394

 

Categorized in Search Engine

Ads now appear in Local Finder results, plus ads display differently in Google Maps.

Google has made changes this week to local search results and Google Maps that will impact retailers and service providers with physical locations.

 

Ads in Local Finder results

Local SEO specialist Brian Barwig was among those who have noticed the ads appearing in the Local Finder results — reached after clicking “More places” from a local three-pack in the main Google search results.

map

The addition of the ads (more than one ad can display) in the Local Finder results means retailers and service providers that aren’t featured in the local three-pack have a new way of getting to the top of the results if users click through to see more listings. (It also means another haven for organic listings has been infiltrated with advertising.)

The ads in the Local Finder rely on AdWords location extensions just like Google Maps, which started featuring ads that used location extensions when Google updated Maps in 2013. Unlike the results in Maps, however, advertisers featured in Local Finder results do not get a pin on the map results.

A Google spokesperson didn’t offer further details other than to say, We’re always testing out new formats for local businesses, but don’t have any additional details to share for now.”

Google Maps is no longer considered a Search Partner

Google has also announced changes to how ads display in Google Maps. Soon, Google will onlyshow ads that include location extensions in Maps; regular text ads will not be featured. The other big change is that Google Maps is no longer considered part of Search Partners. Google has alerted advertisers, and Maps has been removed from the list of Google sites included in Search Partners in the AdWords help pages.

This change in Maps’ status means:

1. Advertisers that use location extensions but had opted out of Search Partners will now be able to have their ads shown in Maps and may see an increase in impressions and clicks as their ads start showing there.

2. Advertisers that don’t use location extensions but were opted into Search Partners could see a drop in impressions and clicks with ads no longer showing in Maps.

The move to include Maps as part of Google search inventory will mean more advertisers will be included in Maps ad auctions. The emphasis on location extensions is in line with Google’s increasing reliance on structured data and feeds, as retailers participating in Google Shopping can attest.

 Source: http://searchengineland.com/google-ads-local-finder-results-maps-not-search-partner-247779

Categorized in Search Engine

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media

Book Your Seat for Webinar - GET 70% OFF FOR MEMBERS ONLY      Register Now