fbpx

OSINT or Open source intelligence refers to information about business or people that can be collected from online sources. However, it requires tools to do so, and here are 10 best OSINT Tools for 2020.

In a world full of information overload, it is natural that we feel the need to vet out the useful information. To do so, organizations globally employ a range of tools, both paid and unpaid. The latter category falls into the domain of open-source intelligence (OSINT) and can be incredibly helpful, especially when you’re looking to save hefty fees on market intelligence reports. Keeping this in mind, here are the 10 best OSINT tools for 2020:

OSINT Framework

Featuring over 30 categories of potential data including the dark web, social networks, and malicious file analysis; the OSINT Framework tool allows you to see the various ways in which you could access such types of data.

For example, let’s say you wanted to know where could you get more information about the dark web. To do so, you would simply click the relevant field on the tree as shown below and it would display a variety of sources you could use to further do your research:

10 Best OSINT Tools for 2020

This saves you a ton of time for having to search for the right tools and literally is a life-saver!

2.  Shodan

Known as the search engine of Internet of Thing (IoT) devices, Shodan allows you to find out information just about any device connected to the internet, whether it is a refrigerator, database, webcam, an industrial control system, or a simple smart television.

The advantage of Shodan is that hardly any other service offers such depth which can not only allow you to collect valuable intelligence but also gain a competitive advantage if you’re a business looking to know more about your competition.

best osint tools for 2020 4 768x277

To add to its credibility, Shodan boasts of the tool being used by 81% of Fortune 100 companies & 1000+ universities.

3. That’s Them

How many times have you wanted to know more about an individual before moving forward with them in terms of a business opportunity or anything else? That’s Them helps you do just that by allowing background checks to be conducted using either an individual’s full name and residency city & state; phone number; or full address. In return, it gives you access to their police records, lawsuits, asset ownership details, addresses, and phone numbers.

10 best osint tools for 2020 1 768x223

These checks are though currently available only within the United States. Furthermore, you would need to subscribe to a plan in order to get more than just basic information about someone.

 

3. N2YO.com

Allowing you to track satellites from afar, N2YO is a great tool for space enthusiasts. It does so by featuring a regularly searched menu of satellites in addition to a database where you could make custom queries along the lines of parameters such as the Space Command ID, launch date, satellite name, and an international designator. You could also set up custom alerts to know about space station events along with a live stream of the International Space Station(ISS)!

best osint tools for 2020 3 768x369

5. Google & Google Images

While Google the main search engine is something that needs no introduction with its vast array of search results including videos and curated news, a lesser-known “Google Images” also exists which can come in very handy.

Apart from the obvious function of allowing you to search images, it allows you to reverse-search any image to find its real origin and therefore save you a lot of time. For example, if I had an image that I needed to track to its original uploader in order to obtain copyright permissions, I would simply upload it to Google Images who would index the internet to find me the source.

10 Best OSINT Tools for 2020 2

Another incredibly helpful feature is the ability to filter images by their resolution, size, and copyrights license helping you find highly relevant images. Furthermore, as it scours images from across the internet, the results are much more in number as compared to other free sites like Pixabay.

6. Yandex Images

The Russian counter-weight to America’s Google, Yandex has been extremely popular in Russia and offers users the option to search across the internet for thousands of images. This is in addition to its reverse-image functionality which is remarkably similar to Google. A good option included within is that you could sort images category wise which can make your searches more specific and accurate.

best osint tools for 2020 2 768x330

Tip: In my personal experience; Yandex image search results are far more accurate and in-depth than Google Images.

7. Censys

Censys is built to help you secure your digital assets in a nutshell. How it works is by allowing users to enter details of their websites, IP addresses, and other digital asset identifiers which it then analyses for vulnerabilities. Once done, it then presents actionable insights for its users.

But this is not all. It is one thing to secure your company’s networks but another to ensure that work-from-home employees are not vulnerable as well with their own setups. Keeping this in mind, you could “scan your employees’ home networks for exposures and vulnerabilities.”

10 Best OSINT Tools for 2020 3 768x501

8. Knowem?

Every brand owner knows the disappointment of finding the social media handle they wanted for their business already taken. Knowem tackles this by allowing one to search username on over 500 social media networks including the famous ones with one simple search.

Additionally, it also has a feature to search for the availability of domain names but this isn’t something unique since pretty much every domain registrar would do so.

On the other hand, if you’re looking for someone to claim a bunch of profiles with a username of your choice automatically, 4 different paid plans are also offered as shown below.

10 Best OSINT Tools for 2020 4 768x247

9. The Internet Archive

A bit nostalgic about the 1990s? We have a time machine here allowing you to access the different versions of pretty much any website date wise. This means, if you wanted to see how a specific website looked like on let’s say 24 June 2003, you could do so using the Internet Archive tool.

10 Best OSINT Tools for 2020 6 1024x292

One potential use of the tool is for analyzing a competitor’s web presence over a time period and using it as market intelligence.

10. HaveIBeenPwned

With database breaches happening every day, it’s only a matter of time before your data also gets exposed. Therefore, keeping a check is vital to ensure you can change your credentials and other details in time. HIBP lets you exactly do so by entering either your password or email address.

10 Best OSINT Tools for 2020 5 768x289

To conclude, although this list is by no means exhaustive, these 10 OSINT tools will not only save your time but also a lot of money.  It is important to remember that every day, professionals from various walks of life utilize these and so it makes perfect sense for you to add it to your toolkit as well.

[Source: This article was published in hackread.com By Sudais Asif - Uploaded by the Association Member: Clara Johnson]

Categorized in Investigative Research

The Financial Times says Apple has likely been working for some time on its own search engine in a bid to steal quota from Google, which in many countries has market shares of over 90%, giving it an alternative as its arrangements with the Mountain View company to make its search engine the default option on Apple devices comes under antitrust scrutiny, and backed by the idea of a search option that respects user privacy.

The latest version of the iPhone operating system, iOS 14, displays its own search results by linking directly to web pages when the user types in queries. Add to the equation the hiring, two years ago, of Google’s executive John Giannandrea, and heavy activity recently of the company’s search agent, Applebot, so it could be the rumors, in the midst of an extremely secretive company like Apple, could be well-founded.

Apple has long been committed to privacy as a fundamental human right and as one its products’ differential value. On numerous occasions, CEO Tim Cook has declared his commitment to privacy, attacking what he calls “the industrial data complex”, without specifically mentioning companies such as Google, Facebook and other data brokers, and placing provocative advertisements at industry conventions such as the Las Vegas CES, as well as going so far as to challenge the FBI itself by refusing to provide a back door to obtain information from its devices when investigating terrorism and other major crimes.

What chance would an Apple search engine have in an environment monopolized by Google? Creating a search engine is an extremely complex task: in addition to generating a huge database with an updated copy of all the pages to be indexed, something Google has been constantly innovating for more than 20 years, it is necessary to create an algorithm that develops the concept of relevance. In this case, Google has already been moving away for some time from its original algorithms — which above all, valued social components such as inbound links — to criteria based on the quality of information and the use of machine learning to try to understand what users are really looking for, but undoubtedly has also travelled more road and accumulated data than anyone in the industry.

On the other hand, and in spite of Google’s efforts to offer greater transparency, many people are suspicious of the amount of information the company has about them as a result not only of the use of its search tools, but of others, such as its email, documents, maps, etc.

 

In previous attempts to compete with Google products, Apple has experienced difficult moments, for example, the disastrous launch of Apple Maps, which led to the departure from the company of one of its vice presidents, Scott Forstall. After that episode, the company’s mapping product was significantly improved with successive redesigns, and it has positioned itself as the third most used mapping application after Google Maps and Waze. However, we should remember that we are talking about a product conditioned by the use of Apple devices, which in many countries is relatively limited, something that would not necessarily be the case with a search engine.

A search engine that respects user privacy could be attractive to a significant part of the market. However, we are talking about deeply rooted use that depends fundamentally on the quality of the results obtained with its use. It could be argued that Google is capable of providing users with better results precisely because of the information it has about us, which wouldn’t apply to Apple. And although the rise of Google in the late 1990s clearly demonstrated the scarce value of loyalty in this area, there is no doubt that it would be difficult to beat the incumbent precisely in the area that it considers the most strategic and definitive.

That said, were Apple to launch its own search engine and take on a giant like Google, there would be huge interest in watching the ensuing fight.

[Source: This article was published in forbes.com By Enrique Dans - Uploaded by the Association Member: Anthony Frank]

Categorized in Search Engine

Welcome to TNW Basics, a collection of tips, guides, and advice on how to easily get the most out of your gadgets, apps, and other stuff.

Stock photos have become a homestead of content creation, but finding the right image can be a hassle — and sometimes a legal liability.

Well, you’ll be delighted to know Google has updated Image Search to make it easier to discover free-to-use images — and how to license the ones you can’t use for free.

Here’s how to take advantage of the new changes:

  • Search for the image you want as you normally would, then head to the Images section.
  • Click on “Tools” to expand the filter menu.
  • Under “Usage Rights,” you’ll find the option to sort images by their license — Creative Commons or commercial use.
  • That’s it.

Final_1800sq_Licensable_SRP_and_Viewer_sha.max-1000x1000.png

One nifty addition is that Google now surfaces information on how you can obtain the rights for a licensed image directly in the description.

If you don’t tick off any of the “Usage Rights” options, Google will simply show all images that fit your search criteria. Images that lack licensing data will be marked with a warning, noting “images may be subject to copyright.”

It’s worth noting Google only highlights licensing details for images if a creator or a publisher has already provided this information, so your best bet to avoid unknowingly using a copyrighted pic is to filter out photos lacking this information.

If you can’t find the right image on Google, you can always try trawling through copyrights-free stock photo sites. We’ve put together a shortlist of some of our favorite options here. Those won’t match the sheer volume and diversity of choice Google offers, but the quality tends to be consistently higher.

 

[Source: This article was published in thenextweb.com By MIX - Uploaded by the Association Member: Martin Grossner]

Categorized in Search Engine

Google has made some new substantial changes to their How Google Search Works” search documents for website owners. And as always when Google makes changes to important documents with impact on SEO, such as How Search Works and the Quality Rater Guidelines, there are some key insights SEOs can gleam from the new changes Google has made.

Of particular note, Google detailing how it views a “document” as potentially comprising of more than one webpage, what Google considers primary and secondary crawls, as well as an update to their reference of “more than 200 ranking factors” which has been present in this document since 2013.

But here are the changes and what they mean for SEOs.

Contents [hide]

  • 1 Crawling
    • 1.1 Improving Your Crawling
  • 2 The Long Version
  • 3 Crawling
    • 3.1 How does Google find a page?
    • 3.2 Improving Your Crawling
  • 4 Indexing
    • 4.1 Improving your Indexing
      • 4.1.1 What is a document?
  • 5 Serving Results
  • 6 Final Thoughts
      • 6.0.1 Jennifer Slegg
      • 6.0.2 Latest posts by Jennifer Slegg (see all)

Crawling

Google has greatly expanded this section.

They made a slight change to wording, with “some pages are known because Google has already crawled them before” changed to “some pages are known because Google has already visited them before.”   This is a fairly minor change, primarily because Google decided to include an expanded section detailing what crawling actually is.

Google removed:

This process of discovery is called crawling.

The removal of the crawling definition was simply because it was redundant.  In Google’s expanded crawling section, they included a much more detailed definition and description of crawling instead.

The added definition:

Once Google discovers a page URL, it visits, or crawls, the page to find out what’s on it. Google renders the page and analyzes both the text and non-text content and overall visual layout to decide where it should appear in Search results. The better that Google can understand your site, the better we can match it to people who are looking for your content.

There is still a great debate on how much page layout is taken into account.  There was the page layout algo that was released many years, in order to penalize content that was pushed well below the fold in order to increase the odds a visitor might click on an advertisement that appeared above the fold instead.  But with more traffic moving to mobile, and the addition of mobile first indexing, the importance of above and below the fold for on page layout seemingly was less important.

When it comes to page layout and mobile first, Google says:

Don’t let ads harm your mobile page ranking. Follow the Better Ads Standard when displaying ads on mobile devices. For example, ads at the top of the page can take up too much room on a mobile device, which is a bad user experience.

But in How Google Search Works, Google is specifically calling attention to the “overall visual layout” with “where it should appear in Search results.”

It also brings attention to “non-text” content.  While the most obvious of this refers to image content, the referral to it is quite open ended.  Could this refer to OCR as well, which we know Google has been dabbling in?

Improving Your Crawling

Under the “to improve your site crawling” section, Google has expanded this section significantly as well.

Google has added this point:

Verify that Google can reach the pages on your site, and that they look correct. Google accesses the web as an anonymous user (a user with no passwords or information). Google should also be able to see all the images and other elements of the page to be able to understand it correctly. You can do a quick check by typing your page URL in the Mobile-Friendly test tool.

This is a good point – so many new site owners end up accidentally blocking Googlebot from crawling or not realizing their site is set to be only viewable by logged in users only.  This makes it clear that site owners should try viewing their site without also being logged into it, to see if there are any unexpected accessibility or other issues that aren’t note when logged in as an admin or high level user.

Also recommending site owners check their site via the Mobile-Friendly testing tool is good, since even seasoned SEOs use the tool to quickly see if there are Googlebot specific issues with how Google is able to see, render and crawl a specific webpage – or a competitor’s page.

Google expanded their specific note about submitting a single page to the index.

If you’ve created or updated a single page, you can submit an individual URL to Google. To tell Google about many new or updated pages at once, use a sitemap.

Previously, it just mentioned submitting changes to a single page using the submit URL tool.  This just adds clarification to those who are newer to SEO that they do not need to submit every single new or updated pages to Google individually, but that using sitemaps is the best way to do that.  There have definitely been new site owners who add each page to Google using that tool because they don’t realize sitemaps is a thing.  But part of this is that WordPress is such a prevalent way to create a new website, yet it does not have native support for sitemaps (yet), so site owners need to either install a specific sitemaps plugin or use one of the many SEO tool plugins that offer sitemaps as a feature.

This new change also highlights using the tool for creating pages as well, instead of just the previous reference of “changes to a single page.”

Google has also made a change to the section about “if you ask Google to crawl only one page” section as well.  They are now referencing what Google views as a “small site” – according to Google,  a smaller site is one with less than 1,000 pages.

Google also stresses the importance of a strong navigation structure, even for sites it considers “small.”  It says site owners of small sites can just submit their homepage to Google, “provided that Google can reach all your other pages by following a path of links that start from your homepage.”

With so many sites being on WordPress, it is less likely that there will be random orphaned pages that are not accessible by following links from the homepage  But depending on the specific WordPress theme used, sometimes there can be orphaned pages from pages being added but not manually added to the pages menu… in these cases, if a sitemap is used as well, those pages shouldn’t be missed even if not directly linked from the homepage.

In the “get your page linked to by another page” section, Google has added that links in “advertisements links that you pay for in other sites, links in comments, or other links that don’t follow the Google Webmaster Guidelines won’t be followed by Google.”  A small change, but Google is making it clear that it is a Google specific thing that these links won’t be followed, but they might be followed by other search engines.

But perhaps the most telling part of this is at the end of the crawling section, Google adds:

Google doesn’t accept payment to crawl a site more frequently, or rank it higher. If anyone tells you otherwise, they’re wrong.

It has long been an issue with scammy SEO companies to guarantee first positioning on Google, to increase rankings or requiring payment to submit a site to Google.  And with the ambiguous Google Partner badge for AdWords, many use the Google Partners badge to imply  they are certified by Google for SEO and organic ranking purposes.  That said, most of those who are reading the How Search Works probably are already aware of this.  But nice to see Google add this in writing again, for times when SEOs need to prove to clients that there is not a “pay to win” option, outside of AdWords, or simply to show someone who might be falling for some scammy SEO company’s claims of Google rankings.

The Long Version

Google then gets into what they call the “long version” of How Google Search Works, with more details on the above sections, covering more nuances that impact SEO.

Crawling

Google has changed how they refer to the “algorithmic process”.  Previously, it stated “Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often and how many pages to fetch from each site.”  Curiously, they removed the reference to “computer programs”, which provoked the question about which computer programs exactly Google was using.

The new updated version simply states:

Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site.

Google also updated the wording for the crawl process, changing that it is “augmented with sitemap data” to “augmented by sitemap” data.

Google also made a change where it referenced that Googlebot “detects” links and changed it to “finds” links, as well as changes from Googlebot visiting “each of these websites” to the much more specific “page”.  This second change makes it more accurate since Google visiting a website won’t necessarily mean it crawls all links on all pages.  The change to “page” makes it more accurate and specific for webmasters.

Previously it read:

As Googlebot visits each of these websites it detects links on each page and adds them to its list of pages to crawl.

Now it reads:

When Googlebot visits a page it finds links on the page and adds them to its list of pages to crawl.

Google has added a new section about using Chrome to crawl:

During the crawl, Google renders the page using a recent version of Chrome. As part of the rendering process, it runs any page scripts it finds. If your site uses dynamically-generated content, be sure that you follow the JavaScript SEO basics.

By referencing a recent version of Chrome, this addition is clarifying the change from last year where Googlebot was finally upgraded to the latest version of Chromium for crawling, an update from Google only crawling with Chrome 41 for years.

Google also notes it runs “any page scripts it finds,” and advises site owners to be aware of possible crawl issues as a result of using dynamically-generated content with the use of JavaScript, specifying that site owners should ensure they follow their JavaScript SEO basics.

Google also details the primary and secondary crawls, something that has garnered much confusion since Google revealed primary and secondary crawls, but Google’s details in this How Google Search Works documents detail it differently than how some SEOs previously interpreted it.

Here is the entire new section for primary and secondary crawls:

Primary crawl / secondary crawl

Google uses two different crawlers for crawling websites: a mobile crawler and a desktop crawler. Each crawler type simulates a user visiting your page with a device of that type.

Google uses one crawler type (mobile or desktop) as the primary crawler for your site. All pages on your site that are crawled by Google are crawled using the primary crawler. The primary crawler for all new websites is the mobile crawler.

In addition, Google recrawls a few pages on your site with the other crawler type (mobile or desktop). This is called the secondary crawl, and is done to see how well your site works with the other device type.

In this section, Google refers to primary and secondary crawls as being specific to their two crawlers – the mobile crawler and the desktop crawler.  Many SEOs think of primary and secondary crawling in reference to Googlebot making two passes over a page, where javascript is rendered on the secondary crawl.  So while Google clarifies their use of desktop and mobile Googlebots, the use of language here does cause confusion for those who use this to refer to the primary and secondary crawls for javascript purposes.  So to be clear, Google’s reference to their primary and secondary crawl has nothing to do with javascript rendering, but only to how they use both mobile and desktop Googlebots to crawl and check a page.

What Google is clarifying in this specific reference to primary and secondary crawl is that Google is using two crawlers – both mobile and desktop versions of Googlebot – and will crawl sites using a combination of both.

Google did specifically state that new websites are crawled with the mobile crawler in their Mobile-First Indexing Best Practices” document, as of July 2019.  But this is the first time it has made an appearance in their How Google Search Works document.

Google does go into more detail about how it uses both the desktop and mobile Googlebots, particularly for sites that are currently considered mobile first by Google.  It wasn’t clear just how much Google was checking desktop versions of sites if they were mobile first, and there have been some who have tried to take advantage of this by presenting a spammier version to desktop users, or in some cases completely different content.  But Google is confirming it is still checking the alternate version of the page with their crawlers.

So sites that are mobile first will see some of their pages crawled with the desktop crawler.  However, it still isn’t clear how Google handles cases where they are vastly different, especially when done for spam reasons, as there doesn’t seem to be any penalty for doing so, aside from a possible spam manual action if it is checked or a spam report is submitted.  And this would have been a perfect opportunity to be clearer about how Google will handle pages with vastly different content depending on whether it is viewed on desktop or on mobile.  Even in the mobile friendly documents, Google only warns about ranking differences if content is on the desktop version of the page but is missing on the mobile version of the page.

How does Google find a page?

Google has removed this section entirely from the new version of the document.

Here is what was included in it:

How does Google find a page?

Google uses many techniques to find a page, including:

  • Following links from other sites or pages
  • Reading sitemaps

It isn’t clear why Google removed this specifically.  It is slightly redundant, but it was missing the submitting a URL option as well.

Improving Your Crawling

Google makes the use of hreflang a bit clearer, especially for those who might just be learning what hreflang is and how it works by providing a bit more detail.

Formerly it said “Use hreflang to point to alternate language pages.”  Now it states “Use hreflang to point to alternate versions of your page in other languages.”

Not a huge change, but a bit clearer.

Google has also added two new points, providing more detail about ensuring Googlebot is able to access all the content on the page, not just the content (words) specifically.

First, Google added:

Be sure that Google can access the key pages, and also the important resources (images, CSS files, scripts) needed to render the page properly.

So Google is stressing about ensuring Google can access all the important content.  And it is also specifically calling attention to other types of elements on the page that Google wants to also have access to in order to properly crawl the page, including images, CSS and scripts.  For those webmasters who went through the whole “mobile first indexing” launch, they are fairly familiar with issues surrounding blocking files, especially CSS and scripts, something that some CMS had blocked Googlebot from crawling by default.

But for newer site owners, they might not realize this was possible, or that they might be doing it.  It would have been nice to see Google add specific information on how those newer to SEO can check for this, particularly for those who also might not be clear on what exactly “rendering” means.

Google also added:

Confirm that Google can access and render your page properly by running the URL Inspection tool on the live page.

Here Google does add specific information about using the URL Inspection tool in order to see what site owners are blocking or content that is causing issues when Google tries to render it.  I think these last two new points could have been combined, and made slightly clearer for how site owners can use the tool to check for all these issues.

Indexing

Google has made significant changes to this section as well. And Google starts off with making major changes to the first paragraph.  Here is the original version:

Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, we process information included in key content tags and attributes, such astags and alt attributes.

The updated version now reads:

Googlebot processes each page it crawls in order to understand the content of the page. This includes processing the textual content, key content tags and attributes, such astags and alt attributes, images, videos, and more.

Google no longer states it processes pages to “compile a massive index of all the words it sees and their location on each page.”  This was always a curious way for them to call attention to the fact they are simply indexing all words it comes across and their position on a page, when in reality it is a lot more complex than that.  So it definitely clears that up.

 

They have also added that they are processing “textual content” which is basically calling attention to the fact it indexes the words on the page, something that was assumed by everyone.  But it does differentiate between the new addition later in the paragraph regarding images, videos and more.

Previously, Google simply made reference to attributes such as title and alt tags and attributes.  But now it is getting more granular, specifically referring to “images, videos and more.”  However, this does mean Google is considering images, videos and “more” to understand the content on the page, which could affect rankings.

Improving your Indexing

Google changed “read our SEO guide for more tips” to “Read our basic SEO guide and advanced user guide for more tips.”

What is a document?

Google has added a massive section here called “What is a document?”  It talks specifically about how Google determines what is a document, but also includes details about how Google views multiple pages with identical content as a single document, even with different URLs, and how it determines canonicals.

First, here is the first part of this new section:

What is a “document”?

Internally, Google represents the web as an (enormous) set of documents. Each document represents one or more web pages. These pages are either identical or very similar, but are essentially the same content, reachable by different URLs. The different URLs in a document can lead to exactly the same page (for instance, example.com/dresses/summer/1234 and example.com?product=1234 might show the same page), or the same page with small variations intended for users on different devices (for example, example.com/mypage for desktop users and m.example.com/mypage for mobile users).

Google chooses one of the URLs in a document and defines it as the document’s canonical URL. The document’s canonical URL is the one that Google crawls and indexes most often; the other URLs are considered duplicates or alternates, and may occasionally be crawled, or served according to the user request: for instance, if a document’s canonical URL is the mobile URL, Google will still probably serve the desktop (alternate) URL for users searching on desktop.

Most reports in Search Console attribute data to the document’s canonical URL. Some tools (such as the Inspect URL tool) support testing alternate URLs, but inspecting the canonical URL should provide information about the alternate URLs as well.

You can tell Google which URL you prefer to be canonical, but Google may choose a different canonical for various reasons.

So the tl:dr is that Google will view pages with identical  or near-identical content as the same document, regardless of how many of them there are.  For seasoned SEOs, we know this as internal duplicate content.

Google also states that when Google determines these duplicates, they may not be crawled as often.  This is important to note for site owners that are working to de-duplicate content which Google is considering duplicate.  So it would be more important to submit these URLs to be recrawled, or give those newly de-duplicated pages links from the homepage in order to ensure Google recrawls and indexed the new content, so Google de-dupes them properly.

It also brings up an important note about desktop versus mobile, that Google will still likely serve the desktop version of a page instead of the mobile version for desktop users, when a site has two different URLs for the same page where is designed for mobile users and the other for desktop.  While many websites have changed to serving the same URL and content for both using responsive design, some sites still run two completely different sites and URLs for desktop and mobile users.

Google also mentions that you can tell Google the URL you prefer Google to use as the canonical, but states they can chose a different URL “for various reasons.”  While Google doesn’t detail specifics about why Google might choose a different canonical than the one the site owner specifies, it is usually due to http vs https, if a page is included in a sitemap or not, page quality, if the pages appear to be completely different and should not be canonicalized, or due to significant incoming links to the non-canonical URL.

Google has also included definitions for many o the terms used by SEOs and in Google Search Console.

Document: A collection of similar pages. Has a canonical URL, and possibly alternate URLs, if your site has duplicate pages. URLs in the document can be from the same or different organization (the root domain, for example “google” in www.google.com). Google chooses the best URL to show in Search results according to the platform (mobile/desktop), user language‡ or location, and many other variables. Google discovers related pages on your site by organic crawling, or by site-implemented features such as redirects or tags. Related pages on other organizations can only be marked as alternates if explicitly coded by your site (through redirects or link tags).

Again, Google is talking about the fact a single document can encompass more than just a single URL, as Google will consider a single document to potentially have many duplicate or near duplicate pages as well as pages assigned via canonical.  Google makes specific mention about “alternates” that appear on other sites, that can only be considered alternates if the site owner specifically codes it.  And that Google will choose the best URL from within the collection of documents to show.

But it fails to mention that Google can consider pages duplicate on other sites and will not show those duplicates, even if they aren’t from the same sites, something that site owners see happen frequently when someone steals content and sometimes sees the stolen version ranking over the original.

There was a notation added for the above, dealing with hreflang.

Pages with the same content in different languages are stored in different documents that reference each other using hreflang tags; this is why it’s important to use hreflang tags for translated content.

Google shows that it doesn’t include identical content under the same “document” when it is simply in a different language, which is interesting.  But Google is tressing the importance of using hreflang in these cases.

URL: The URL used to reach a given piece of content on a site. The site might resolve different URLs to the same page.

Pretty self explanatory, although it does have reference to the fact different URLs can be resolved to the same page, presumably such as with redirects or alias.

Page: A given web page, reached by one or more URLs. There can be different versions of a page, depending on the user’s platform (mobile, desktop, tablet, and so on).

Also pretty self explanatory, bringing up the specifics that some site owners can be served different versions of the same page, such as if they try and view the same page on a mobile device versus a desktop computer.

Version: One variation of the page, typically categorized as “mobile,” “desktop,” and “AMP” (although AMP can itself have mobile and desktop versions). Each version can have a different URL (example.com vs m.example.com) or the same URL (if your site uses dynamic serving or responsive web design, the same URL can show different versions of the same page) depending on your site configuration. Language variations are not considered different versions, but different documents.

Simply clarifying with greater details the different versions of a page, and how Google typically categorizes them as “mobile,” “desktop,” and “AMP”.

Canonical page or URL: The URL that Google considers as most representative of the document. Google always crawls this URL; duplicate URLs in the document are occasionally crawled as well.

Google states here again that non-canonical pages are not crawled as often as the main canonical that a site owner assigns to a group of pages they want canonical.  Google does not include specific mention here that they sometimes chose a different page as the canonical one, even if there is a specific page designated as the canonical one.

Alternate/duplicate page or URL: The document URL that Google might occasionally crawl. Google also serves these URLs if they are appropriate to the user and request (for example, an alternate URL for desktop users will be served for desktop requests rather than a canonical mobile URL).

The key takeaway here is that Google “might” occasionally crawl the site’s duplicate or alternative page.  And here they stress that Google will serve these alternative URLs “if they are appropriate.”  It is unfortunate they don’t go into greater detail in why they might serve these pages instead of the canonical, outside of the mention of desktop versus mobile, as we have seen many cases where Google picks a different page to show other than the canonical for a myriad of reasons.

Google also fails to mention how this impacts duplicate content found on other sites, we we do know Google will crawl those less often as well.

Site: Usually used as a synonym for a website (a conceptually related set of web pages), but sometimes used as a synonym for a Search Console property, although a property can actually be defined as only part of a site. A site can span subdomains (and even domains, for properly linked AMP pages).

Interesting to note here what they consider a website – a conceptually related set of webpages – and how it related to the usage of a Google Search Console property, as “a property can actually be defined as only part of a site.”

Google does make mention that AMP, which technically appear on a different domain, are considered part of the main site.

Serving Results

Google has made a pretty interesting specific change here in regards to their ranking factors.  Previously, Google stated:

Relevancy is determined by over 200 factors, and we always work on improving our algorithm.

Google has now updated this “over 200 factors” with a less specific one.

Relevancy is determined by hundreds of factors, and we always work on improving our algorithm.

The 200 factors in the How Google Search Works dates back to 2013 when the document was launched, although then it also made reference to PageRank (“Relevancy is determined by over 200 factors, one of which is the PageRank for a given page”) which Google removed when they redesigned their document in 2018.

While Google doesn’t go into specifics on the number anymore, it can be assumed that a significant number of ranking factors have been added since 2013 when this was first claimed in this document.  But I am sure some SEOs will be disappointed we don’t get a brand new shiny number like “over 500” ranking factors that SEOs can obsess about.

Final Thoughts

There are some pretty significant changes made to this document that SEOs can get a bit of insight from.

Google’s description of what it considers a document and how it relates to other identical or near-identical pages on a site is interesting, as well as Google’s crawling behavior towards the pages within a document it considers as alternate pages.  While this behavior has often been noted, it is more concrete information on how site owners should handle these duplicate and near-duplicate pages, particularly when they are trying to un-duplicate those pages and see them crawled and indexed as their own document.

They added a lot of useful advice for newer site owners, which is particularly helpful with so many new websites coming online this year due to the global pandemic.  Things such as checking a site without being logged in, how to submit both pages and sites to Google, etc.

The mention of what Google considers a “small site” is interesting because it gives a more concrete reference point for how Google sees large versus small sites.  For some, a small site could mean under 30 pages and the idea of a site with millions of pages being unfathomable.  And the reinforcement of a strong navigation, even for “small sites” is useful for showing site owners and clients who might push for navigation that is more aesthetic than practical for both usability and SEO.

The primary and secondary crawl additions will probably cause some confusion for those who think of primary and secondary in terms of how Google processes scripts on a page when it crawls it.  But it is nice to have more concrete information on how and when Google will crawl using the alternate version of Googlebot for sites that are usually crawled with either the mobile Googlebot or the desktop one.

Lastly, the change from the “200 ranking factors” to a less specific, but presumably much higher number of ranking factors will disappoint some SEOs who liked having some kind of specific number of potential ranking factors to work out.

[Source: This article was published in thesempost.com By JENNIFER SLEGG - Uploaded by the Association Member: Barbara larson]

Categorized in Search Techniques

Internet marketing and advertising is becoming increasingly difficult and expensive, especially for small to mid-level businesses. Veteran SEO expert Tony Rockliff urges business owners to utilize the power of YouTube as a promising alternative to the otherwise slow, painful and expensive build of a Google SEO campaign.

CLEARWATER, Fla.Feb. 24, 2020 /PRNewswire-PRWeb/ -- In 2020, according to the World Advertising and Research Center, spending on internet advertising will reach more than 50% of total global ad spend, an all-time record.(1) "A subset of internet advertising—search engine optimization (SEO) is now the major battleground in marketing today. SEO expert Tony Rockliff, founder and CEO of Tony Rockliff Productions, states as SEO "gets bigger, it gets tougher." For an increasing number of companies, especially SMBs, the smart move is to consider YouTube as an additional SEO powerhouse instead of the conventional reliance on Google. By using YouTube, business owners can combat the stiff competition for consumer attention and variating Google search algorithms.

Text Versus Video Content

According to a research study from Common Sense Media, more than twice as many young people watch videos every day as did four years ago, while the average time spent watching videos—primarily on YouTube—has roughly doubled, to an hour a day.(3) Video's popularity has exploded, while text takes a back seat. It is increasingly obvious in the industry that text-based content is saturated, and that if a company isn't willing to give it at least one year and invest considerable amounts, they shouldn't spend much time on traditional Google based SEO. (2)

The combination of the video-centricity of today's consumers coupled with the increasing expense and difficulty of attracting attention via text-based Google listings, Rockliff suggests, is what identifies YouTube an increasingly robust platform for video-savvy marketers.

How Businesses Can Adapt to YouTube

To capitalize on this opportunity, Rockliff urges marketers to research YouTube to qualify exactly what video content is needed, and which of this content will get the most responses from its viewers, or potential clients. He organizes the four major stages of YouTube optimization:

  • 1. Find out what is being searched for on YouTube in your area or niche that you can compete for.
  • 2. Create video content that answers what is being searched for, and also provides what YouTube is searching for, i.e. views per video, average time spent watching, engagement per video, and number of subscribers gained per video.
  • 3. Publish your videos properly and in an optimized manner.
  • 4. Promote your videos according to how and when YouTube wants to see them promoted.

Rockliff has been in search engine optimization since 1998 and online marketing since 1995. His online community membership site has grown to 1.3 million members and was receiving 1.5 billion hits per year before he sold it in 2002. Over the years, Rockliff has seen profound changes in both opportunity and approach of YouTube as a marketing strategy, and right now, YouTube represents a great prospect to get noticed and build a brand loyal following. This is especially useful for organizations that do not have an extensive marketing budget. "The key is to understand what you're selling and optimize all four major stages," Rockliff states.

Tony Rockliff will be speaking at the Podfest 2020 Multimedia Expo, March 6th-8th, at the Orlando World Center Marriott in Orlando, Florida. For more information, please see http://podfestexpo.com/speakers/

About Tony Rockliff Productions:

Tony Rockliff Productions was founded in 1995 by digital pioneer and trailblazer, Tony Rockliff. His video marketing company is based out of Clearwater, Florida, and brings over fifty years of audio/video marketing experience to the business. Remaining to be a top disruptor of the video marketing and media industry throughout his career, his world-renowned success is a product of his passion for storytelling through the art of video. Tony Rockliff Productions specializes in video and audio creation, producing music and videos, YouTube optimization, and building out-of-the-ordinary websites. Currently, Tony Rockliff Productions focuses on organic YouTube video marketing, a profitable niche of the industry that is host to 1.9 billion logged-in users per month. You can visit him here https://tonyrockliff.com/

 

  • 1. Handley, Lucy, "Global ad spend has slowed but 2020 looks set to be a bumper year," CNBC, October 24, 2019, cnbc.com/2019/10/24/global-ad-spend-has-slowed-but-2020-looks-set-to-be-a-bumper-year.html.
  • 2. Patel, Neil, "Everything I Taught You About SEO Was Wrong," neilpatel.com/blog.
  • 3. Siegel, Rachel, "Tweens, teens, and screens: The average time kids spend watching online videos has doubled in 4 years," Washington Post, October 29, 2019, washingtonpost.com/technology/2019/10/29/survey-average-time-young-people-spend-watching-videos-mostly-youtube-has-doubled-since/.

[Source: This article was published in finance.yahoo.com - Uploaded by the Association Member: Jasper Solander]

Categorized in Search Engine

Lizzi Harvey from Google created a single page to follow the major updates made to the Google Search Developer documentation. So now you can just scan this page over here and see what updates she and her teammates made to the Google Search Developer documentations online.

Lizzi announced this on Twitter saying "Do you often wish there was 1 page that you could check and see what's new in the search dev docs? Well, here it is, backdated to include things that happened this month."

Screenshot 6

Will there be a way to subscribe to these updates? RSS probably is not going to happen.

Screenshot 7

[Source: This article was published in seroundtable.com By CBarry Schwartz - Uploaded by the Association Member: David J. Redcliff]

Categorized in Search Engine

Now that the Google January 2020 core update is mostly rolled out, we have asked several data providers to send us what they found with this Google search update. All of the data providers agree that this core update was a big one and impacted a large number of web sites.

The facts. What we know from Google, as we previously reported, is that the January 2020 core update started to roll out around 12:00 PM ET on Monday, January 13th. That rollout was “mostly done” by Thursday morning, on January 16th. We also know that this was a global update, and was not specific to any region, language or category of web sites. It is a classic “broad core update.”

What the tools are seeing. We have gone to third-party data companies asking them what their data shows about this update.

RankRanger. Mordy Oberstein from RankRanger said, “the YMYL (your money, your life) niches got hit very hard.” “This a huge update,” he added. “There is massive movement at the top of the SERP for the Health and Finance niches and incredible increases for all niches when looking at the top 10 results overall.”

Here is a chart showing the rank volatility broken down by industry and the position of those rankings:

 all-niche-data-jan2020-core-update-800x550.png

“Excluding the Retail niche, which according to what I am seeing was perhaps a focus of the December 6th update, the January 2020 core update was a far larger update across the board and at every ranking position,” Mordy Oberstein added. “However, when looking at the top 10 results overall during the core update, the Retail niche started to separate itself from the levels of volatility seen in December as well.”

SEMRush. Yulia Ibragimova from SEMRush said “We can see that the latest Google Update was quite big and was noticed almost in every category.” The most volatile categories according to SEMRush, outside of Sports and News, were Online communities, Games, Arts & Entertainments, and Finance. But Yulia Ibragimova added that all categories saw major changes and “we can assume that this update wasn’t aimed to any particular topics,” she told us.

SEMRush offers a lot of data available on its web site over here. But they sent us this additional data around this update for us.

Here is the volatility by category by mobile vs desktop search results:

semrush-catts-642x600.png

The top ten winners according to SEMRush were Dictionary.com, Hadith of the Day, Discogs, ABSFairings, X-Rates, TechCrunch, ShutterStock, 247Patience, GettyImages and LiveScores.com. The top ten losers were mp3-youtube.download, TotalJerkFace.com, GenVideos.io, Tuffy, TripSavvy, Honolulu.gov, NaughtyFind, Local.com, RuthChris and Local-First.org.

 

Sistrix. Johannes Beus from Sistrix posted their analysis of this core update. He said “Domains that relate to YMYL (Your Money, Your Life) topics have been re-evaluated by the search algorithm and gain or lose visibility as a whole. Domains that have previously been affected by such updates are more likely to be affected again. The absolute fluctuations appear to be decreasing with each update – Google is now becoming more certain of its assessment and does not deviate as much from the previous assessment.”

Here is the Sistrix chart showing the change:

 uk.sistrix.com_onhealth.com_seo_visibility-1-800x361.png

According to Sistrix, the big winners were goal.com, onhealth.com, CarGurus, verywellhealth.com, Fandango, Times Of Israel, Royal.uk, and WestField. The big losers were CarMagazine.co.uk, Box Office Mojo, SkySports, ArnoldClark.com, CarBuyer.co.uk, History Extra, Evan Shalshaw, and NHS Inform.

SearchMetrics. Marcus Tober, the founder of SearchMetrics, told us “the January Core Update seems to revert some changes for the better or worse depending on who you are. It’s another core update where thin content got penalized and where Google put an emphasis in YMYL. The update doesn’t seem to affect as many pages as with the March or September update in 2019. But has similar characteristics.”

Here are some specific examples SearchMetrics shared. First was that Onhealth.com has won at March 2019 Core update and lost at September 2019 and won again big time at January 2020 Core update

 onhealth-800x320.png

While Verywellhealth.com was loser during multiple core updates:

 verywell-800x316.png

Draxe.com, which has been up and down during core updates, with this update seems to be a big winner with +83%. but in previous core updates, it got hit hard:

 draxe-800x318.png

The big winners according to SearchMetrics were esty.com, cargurus.com, verywellhealth.com, overstock.com, addictinggames.com, onhealth.com, bigfishgames,com and health.com. The big losers were tmz.com, academy.com, kbhgames.com, orbitz.com, silvergames.com, autolist.com, etonline.com, trovit.com and pampers.com.

What to do if you are hit. Google has given advice on what to consider if you are negatively impacted by a core update in the past. There aren’t specific actions to take to recover, and in fact, a negative rankings impact may not signal anything is wrong with your pages. However, Google has offered a list of questions to consider if you’re site is hit by a core update.

Why we care. It is often hard to isolate what you need to do to reverse any algorithmic hit your site may have seen. When it comes to Google core updates, it is even harder to do so. If this data and previous experience and advice has shown us is that these core updates are broad, wide and cover a lot of overall quality issues. The data above has reinforced this to be true. So if your site was hit by a core update, it is often recommended to step back from it all, take a wider view of your overall web site and see what you can do to improve the site overall.

[Source: This article was published in searchengineland.com By Barry Schwartz - Uploaded by the Association Member: Edna Thomas]

Categorized in Search Engine

Michael struggles to find the search results he’s looking for, and would like some tips for better Googling

 Want to search like a pro? These tips will help you up you Googling game using the advanced tools to narrow down your results. Photograph: Alastair Pike/AFP via Getty Images
Last week’s column mentioned search skills. I’m sometimes on the third page of results before I get to what I was really looking for. I’m sure a few simple tips would find these results on page 1. All advice welcome. Michael

Google achieved its amazing popularity by de-skilling search. Suddenly, people who were not very good at searching – which is almost everyone – could get good results without entering long, complex searches. Partly this was because Google knew which pages were most important, based on its PageRank algorithm, and it knew which pages were most effective, because users quickly bounced back from websites that didn’t deliver what they wanted.

Later, Google added personalisation based on factors such as your location, your previous searches, your visits to other websites, and other things it knew about you. This created a backlash from people with privacy concerns, because your searches into physical and mental health issues, legal and social problems, relationships and so on can reveal more about you than you want anyone else – or even a machine – to know.

When talking about avoiding “the creepy line”, former Google boss Eric Schmidt said: “We don’t need you to type at all. We know where you are. We know where you’ve been. We can more or less know what you’re thinking about.”

Google hasn’t got to that point, yet, but it does want to save you from typing. Today, Google does this through a combination of auto-complete search suggestions, Answer Boxes, and “People also ask” boxes, which show related questions along with their “feature snippets”. As a result, Google is much less likely to achieve its stated aim of sending you to another website. According to Jumpshot research, about half of browser-based searches no longer result in a click, and about 6% go to Google-owned properties such as YouTube and Maps.

You could get upset about Google scraping websites such as Wikipedia for information and then keeping their traffic, but this is the way the world is going. Typing queries into a browser is becoming redundant as more people use voice recognition on smartphones or ask the virtual assistant on their smart speakers. Voice queries need direct answers, not pages of links.

So, I can give you some search tips, but they may not be as useful as they were when I wrote about them in January 2004 – or perhaps not for as long.

Advanced Search for everyone
Advanced Search for everyone.jpg
 Google’s advanced search page is the tool to properly drill down into the results. Photograph: Samuel Gibbs/The Guardian

The easiest way to create advanced search queries in Google is to use the form on the Advanced Search page, though I suspect very few people do. You can type different words, phrases or numbers that you want to include or exclude into the various boxes. When you run the search, it converts your input into a single string using search shortcuts such as quotation marks (to find an exact word or phrase) and minus signs (to exclude words).

You can also use the form to narrow your search to a particular language, region, website or domain, or to a type of file, how recently it was published and so on. Of course, nobody wants to fill in forms. However, using the forms will teach you most of the commands mentioned below, and it’s a fallback if you forget any.

Happily, many commands work on other search engines too, so skills are transferable.

Use quotation marks
4759.jpg
 Quotation marks can be a powerful tool to specify exact search terms. Photograph: IKEA

If you are looking for something specific, quotation marks are invaluable. Putting quotation marks around single words tells the search engine that you definitely want them to appear on every page it finds, rather than using close matches or synonyms. Google will, of course, ignore this, but at least the results page will tell you which word it has ignored. You can click on that word to insist, but you will get fewer or perhaps no results.

Putting a whole phrase in inverted commas has the same effect, and is useful for finding quotations, people’s names, book and film titles, or particular phrases.

You can also use an asterisk as a wildcard to find matching phrases. For example, The Simpsons episode, Deep Space Homer, popularised the phrase: “I for one welcome our new insect overlords”. Searching for “I for one welcome our new * overlords” finds other overlords such as aliens, cephalopods, computers, robots and squirrels.

Nowadays, Google’s RankBrain is pretty good at recognising titles and common phrases without quote marks, even if they include “stop words” such as a, at, that, the and this. You don’t need quotation marks to search for the Force, The Who or The Smiths.

However, it also uses synonyms rather than strictly following your keywords. It can be quicker to use minus signs to exclude words you don’t want than add terms that are already implied. One example is jaguar -car.

Use site commands

2618.jpg
 Using the ‘site:’ command can be a powerful tool for quickly searching a particular website. Photograph: Samuel Gibbs/The Guardian

Google also has a site: command that lets you limit your search to a particular website or, with a minus sign (-site:), exclude it. This command uses the site’s uniform resource locator or URL.

 

For example, if you wanted to find something on the Guardian’s website, you would type site:theguardian.com (no space after the colon) alongside your search words.

You may not need to search the whole site. For example, site:theguardian.com/technology/askjack will search the Ask Jack posts that are online, though it doesn’t search all the ancient texts (continued on p94).

There are several similar commands. For example, inurl: will search for or exclude words that appear in URLs. This is handy because many sites now pack their URLs with keywords as part of their SEO (search-engine optimisation). You can also search for intitle: to find words in titles.

Web pages can include incidental references to all sorts of things, including plugs for unrelated stories. All of these will duly turn up in text searches. But if your search word is part of the URL or the title, it should be one of the page’s main topics.

You can also use site: and inurl: commands to limit searches to include, or exclude, whole groups of websites. For example, either site:co.uk or inurl:co.uk will search matching UK websites, though many UK sites now have .com addresses. Similarly, site:ac.uk and inurl:ac.uk will find pages from British educational institutions, while inurl:edu and site:edu will find American ones. Using inurl:ac.uk OR inurl:edu (the Boolean command must be in caps) will find pages from both. Using site:gov.uk will find British government websites, and inurl:https will search secure websites. There are lots of options for inventive searchers.

Google Search can also find different types of file, using either filetype: or ext: (for file extension). These include office documents (docx, pptx, xlxs, rtf, odt, odp, odx etc) and pdf files. Results depend heavily on the topic. For example, a search for picasso filetype:pdf is more productive than one for stormzy.

Make it a date

1700.jpg
 Narrowing your search by date can find older pieces. Photograph: Samuel Gibbs/The Guardian

We often want up-to-date results, particularly in technology where things that used to be true are not true any more. After you have run a search, you can use Google’s time settings to filter the results, or use new search terms. To do this, click Tools, click the down arrow next to “Any time”, and use the dropdown menu to pick a time period between “Past hour” and “Past year”.

Last week, I was complaining that Google’s “freshness algorithm” could serve up lots of blog-spam, burying far more useful hits. Depending on the topic, you can use a custom time range to get less fresh but perhaps more useful results.

Custom time settings are even more useful for finding contemporary coverage of events, which might be a company’s public launch, a sporting event, or something else. Human memories are good at rewriting history, but contemporaneous reports can provide a more accurate picture.

However, custom date ranges have disappeared from mobile, the daterange: command no longer seems to work in search boxes, and “sort by date” has gone except in news searches. Instead, this year, Google introduced before: and after: commands to do the same job. For example, you could search for “Apple iPod” before:2002-05-31 after:2001-10-15 for a bit of nostalgia. The date formats are very forgiving, so one day we may all prefer it.

 [Source: This article was published in theguardian.com - Uploaded by the Association Member: Carol R. Venuti] 

Categorized in Search Engine

Ever had to search for something on Google, but you’re not exactly sure what it is, so you just use some language that vaguely implies it? Google’s about to make that a whole lot easier.

Google announced today it’s rolling out a new machine learning-based language understanding technique called Bidirectional Encoder Representations from Transformers, or BERT. BERT helps decipher your search queries based on the context of the language used, rather than individual words. According to Google, “when it comes to ranking results, BERT will help Search better understand one in 10 searches in the U.S. in English.”

Most of us know that Google usually responds to words, rather than to phrases — and Google’s aware of it, too. In the announcement, Pandu Nayak, Google’s VP of search, called this kind of searching “keyword-ese,” or “typing strings of words that they think we’ll understand, but aren’t actually how they’d naturally ask a question.” It’s amusing to see these kinds of searches — heck, Wired has made a whole cottage industry out of celebrities reacting to these keyword-ese queries in their “Autocomplete” video series” — but Nayak’s correct that this is not how most of us would naturally ask a question.

As you might expect, this subtle change might make some pretty big waves for potential searchers. Nayak said this “[represents] the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search.” Google offered several examples of this in action, such as “Do estheticians stand a lot at work,” which apparently returned far more accurate search results.

I’m not sure if this is something most of us will notice — heck, I probably wouldn’t have noticed if I hadn’t read Google’s announcement, but it’ll sure make our lives a bit easier. The only reason I can see it not having a huge impact at first is that we’re now so used to keyword-ese, which is in some cases more economical to type. For example, I can search “What movie did William Powell and Jean Harlow star in together?” and get the correct result (Libeled Lady; not sure if that’s BERT’s doing or not), but I can also search “William Powell Jean Harlow movie” and get the exact same result.

 

BERT will only be applied to English-based searches in the US, but Google is apparently hoping to roll this out to more countries soon.

[Source: This article was published in thenextweb.com By RACHEL KASER - Uploaded by the Association Member: Dorothy Allen]

Categorized in Search Engine

The new language model can think in both directions, fingers crossed

Google has updated its search algorithms to tap into an AI language model that is better at understanding netizens' queries than previous systems.

Pandu Nayak, a Google fellow and vice president of search, announced this month that the Chocolate Factory has rolled out BERT, short for Bidirectional Encoder Representations from Transformers, for its most fundamental product: Google Search.

To pull all of this off, researchers at Google AI built a neural network known as a transformer. The architecture is suited to deal with sequences in data, making them ideal for dealing with language. To understand a sentence, you must look at all the words in it in a specific order. Unlike previous transformer models that only consider words in one direction – left to right – BERT is able to look back to consider the overall context of a sentence.

“BERT models can, therefore, consider the full context of a word by looking at the words that come before and after it—particularly useful for understanding the intent behind search queries,” Nayak said.

For example, below's what the previous Google Search and new BERT-powered search looks like when you query: “2019 brazil traveler to usa need a visa.”

2019 brazil

Left: The result returned for the old Google Search that incorrectly understands the query as a US traveler heading to Brazil. Right: The result returned for the new Google Search using BERT, which correctly identifies the search is for a Brazilian traveler going to the US. Image credit: Google.

 

BERT has a better grasp of the significance behind the word "to" in the new search. The old model returns results that show information for US citizens traveling to Brazil, instead of the other way around. It looks like BERT is a bit patchy, however, as a Google Search today still appears to give results as if it's American travelers looking to go to Brazil:

current google search

Current search result for the query: 2019 brazil traveler to USA need a visa. It still thinks the sentence means a US traveler going to Brazil

The Register asked Google about this, and a spokesperson told us... the screenshots were just a demo. Your mileage may vary.

"In terms of not seeing those exact examples, the side-by-sides we showed were from our evaluation process, and might not 100 percent mirror what you see live in Search," the PR team told us. "These were side-by-side examples from our evaluation process where we identified particular types of language understanding challenges where BERT was able to figure out the query better - they were largely illustrative.

"Search is dynamic, content on the web changes. So it's not necessarily going to have a predictable set of results for any query at any point in time. The web is constantly changing and we make a lot of updates to our algorithms throughout the year as well."

Nayak claimed BERT would improve 10 percent of all its searches. The biggest changes will be for longer queries, apparently, where sentences are peppered with prepositions like “for” or “to.”

“BERT will help Search better understand one in 10 searches in the US in English, and we’ll bring this to more languages and locales over time,” he said.

Google will run BERT on its custom Cloud TPU chips; it declined to disclose how many would be needed to power the model. The most powerful Cloud TPU option currently is the Cloud TPU v3 Pods, which contain 64 ASICs, each carrying performance of 420 teraflops and 128GB of high-bandwidth memory.

At the moment, BERT will work best for queries made in English. Google said it also works in two dozen countries for other languages, too, such as Korean, Hindi, and Portuguese for “featured snippets” of text. ®

[Source: This article was published in theregister.co.uk By Katyanna Quach - Uploaded by the Association Member: Anthony Frank]

Categorized in Search Engine
Page 1 of 20

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media