fbpx

"In the future, everyone will be anonymous for 15 minutes." So said the artist Banksy, but following the rush to put everything online, from relationship status to holiday destinations, is it really possible to be anonymous - even briefly - in the internet age?

That saying, a twist on Andy Warhol's famous "15 minutes of fame" line, has been interpreted to mean many things by fans and critics alike. But it highlights the real difficulty of keeping anything private in the 21st Century.

"Today, we have more digital devices than ever before and they have more sensors that capture more data about us," says Prof Viktor Mayer-Schoenberger of the Oxford Internet Institute.

And it matters. According to a survey from the recruitment firm Careerbuilder, in the US last year 70% of companies used social media to screen job candidates, and 48% checked the social media activity of current staff.

Also, financial institutions can check social media profiles when deciding whether to hand out loans.

_108600940_banksybarelylegal2006.jpg

Meanwhile, companies create models of buying habits, political views and even use artificial intelligence to gauge future habits based on social media profiles.

One way to try to take control is to delete social media accounts, which some did after the Cambridge Analytica scandal, when 87 million people had their Facebook data secretly harvested for political advertising purposes.

While deleting social media accounts may be the most obvious way to remove personal data, this will not have any impact on data held by other companies.

Fortunately, in some countries the law offers protection.

In the European Union the General Data Protection Regulation (GDPR) includes the "right to be forgotten" - an individual's right to have their personal data removed.

In the UK the that is policed by the Information Commissioner's Office. Last year it received 541 requests to have information removed from search engines, according to data shown to the BBC, up from 425 the year before, and 303 in 2016-17.

The actual figures may be higher as ICO says it often only becomes involved after an initial complaint made to the company that holds the information has been rejected.

But ICO's Suzanne Gordon says it is not clear-cut: "The GDPR has strengthened the rights of people to ask for an organisation to delete their personal data if they believe it is no longer necessary for it to be processed.

"However, this right is not absolute and in some cases must be balanced against other competing rights and interests, for example, freedom of expression."

The "right to be forgotten" shot to prominence in 2014 and led to a wide-range of requests for information to be removed - early ones came from an ex-politician seeking re-election, and a paedophile - but not all have to be accepted.

Companies and individuals, that have the money, can hire experts to help them out.

A whole industry is being built around "reputation defence" with firms harnessing technology to remove information - for a price - and bury bad news from search engines, for example.

One such company, Reputation Defender, founded in 2006, says it has a million customers including wealthy individuals, professionals and chief executives. It charges around £5,000 ($5,500) for its basic package.

It uses its own software to alter the results of Google searches about its clients, helping to lower less favourable stories in the results and promote more favourable ones instead.

_108600440_googlegettyimages-828896324-1.jpg

"The technology focuses on what Google sees as important when indexing websites at the top or bottom of the search results," says Tony McChrystal, managing director.

"Generally, the two major areas Google prioritises are the credibility and authority the web asset has, and how users engage with the search results and the path Google sees each unique individual follow.

"We work to show Google that a greater volume of interest and activity is occurring on sites that we want to promote, whether they're new websites we've created, or established sites which already appear in the [Google results pages], while sites we are seeking to suppress show an overall lower percentage of interest."

The firm sets out to achieve its specified objective within 12 months.

"It's remarkably effective," he adds, "since 92% of people never venture past the first page of Google and more than 99% never go beyond page two."

Prof Mayer-Schoenberger points out that, while reputation defence companies may be effective, "it is hard to understand why only the rich that can afford the help of such experts should benefit and not everyone".

_108598284_warhol.jpg

So can we ever completely get rid of every online trace?

"Simply put, no," says Rob Shavell, co-founder and chief executive of DeleteMe, a subscription service which aims to remove personal information from public online databases, data brokers, and search websites.

"You cannot be completely erased from the internet unless somehow all companies and individuals operating internet services were forced to fundamentally change how they operate.

"Putting in place strong sensible regulation and enforcement to allow consumers to have a say in how their personal information can be gathered, shared, and sold would go a long way to addressing the privacy imbalance we have now."

[Source: This article was published in bbc.com By Mark Smith - Uploaded by the Association Member: Jay Harris]

Categorized in Internet Privacy

Reverse image search is one of the most well-known and easiest digital investigative techniques, with two-click functionality of choosing “Search Google for image” in many web browsers. This method has also seen widespread use in popular culture, perhaps most notably in the MTV show Catfish, which exposes people in online relationships who use stolen photographs on their social media.

However, if you only use Google for reverse image searching, you will be disappointed more often than not. Limiting your search process to uploading a photograph in its original form to just images.google.com may give you useful results for the most obviously stolen or popular images, but for most any sophisticated research project, you need additional sites at your disposal — along with a lot of creativity.

This guide will walk through detailed strategies to use reverse image search in digital investigations, with an eye towards identifying people and locations, along with determining an image’s progeny. After detailing the core differences between the search engines, Yandex, Bing, and Google are tested on five test images showing different objects and from various regions of the world.

Beyond Google

The first and most important piece of advice on this topic cannot be stressed enough: Google reverse image search isn’t very good.

As of this guide’s publication date, the undisputed leader of reverse image search is the Russian site Yandex. After Yandex, the runners-up are Microsoft’s Bing and Google. A fourth service that could also be used in investigations is TinEye, but this site specializes in intellectual property violations and looks for exact duplicates of images.

Yandex

Yandex is by far the best reverse image search engine, with a scary-powerful ability to recognize faces, landscapes, and objects. This Russian site draws heavily upon user-generated content, such as tourist review sites (e.g. FourSquare and TripAdvisor) and social networks (e.g. dating sites), for remarkably accurate results with facial and landscape recognition queries.

Its strengths lie in photographs taken in a European or former-Soviet context. While photographs from North America, Africa, and other places may still return useful results on Yandex, you may find yourself frustrated by scrolling through results mostly from Russia, Ukraine, and eastern Europe rather than the country of your target images.

To use Yandex, go to images.yandex.com, then choose the camera icon on the right.

yandex instructions1

From there, you can either upload a saved image or type in the URL of one hosted online.

yandex instructions2 1536x70

If you get stuck with the Russian user interface, look out for Выберите файл (Choose file), Введите адрес картинки (Enter image address), and Найти (Search). After searching, look out for Похожие картинки (Similar images), and Ещё похожие (More similar).

The facial recognition algorithms used by Yandex are shockingly good. Not only will Yandex look for photographs that look similar to the one that has a face in it, but it will also look for other photographs of the same person (determined through matching facial similarities) with completely different lighting, background colors, and positions. While Google and Bing may just look for other photographs showing a person with similar clothes and general facial features, Yandex will search for those matches, and also other photographs of a facial match. Below, you can see how the three services searched the face of Sergey Dubinsky, a Russian suspect in the downing of MH17. Yandex found numerous photographs of Dubinsky from various sources (only two of the top results had unrelated people), with the result differing from the original image but showing the same person. Google had no luck at all, while Bing had a single result (fifth image, second row) that also showed Dubinsky.

Screenshot 4

Screenshot 5

Yandex is, obviously, a Russian service, and there are worries and suspicions of its ties (or potential future ties) to the Kremlin. While we at Bellingcat constantly use Yandex for its search capabilities, you may be a bit more paranoid than us. Use Yandex at your own risk, especially if you are also worried about using VK and other Russian services. If you aren’t particularly paranoid, try searching an un-indexed photograph of yourself or someone you know in Yandex, and see if it can find yourself or your doppelganger online.

Bing

Over the past few years, Bing has caught up to Google in its reverse image search capabilities, but is still limited. Bing’s “Visual Search”, found at images.bing.com, is very easy to use, and offers a few interesting features not found elsewhere.

bing visualsearch

Within an image search, Bing allows you to crop a photograph (button below the source image) to focus on a specific element in said photograph, as seen below. The results with the cropped image will exclude the extraneous elements, focusing on the user-defined box. However, if the selected portion of the image is small, it is worth it to manually crop the photograph yourself and increase the resolution — low-resolution images (below 200×200) bring back poor results.

Below, a Google Street View image of a man walking a couple of pugs was cropped to focus on just the pooches, leading to Bing to suggest the breed of dog visible in the photograph (the “Looks like” feature), along with visually similar results. These results mostly included pairs of dogs being walked, matching the source image, but did not always only include pugs, as French bulldogs, English bulldogs, mastiffs, and others are mixed in.

bing results cropped 1536x727

Google

By far the most popular reverse image search engine, at images.google.com, Google is fine for most rudimentary reverse image searches. Some of these relatively simple queries include identifying well-known people in photographs, finding the source of images that have been shared quite a bit online, determining the name and creator of a piece of art, and so on. However, if you want to locate images that are not close to an exact copy of the one you are researching, you may be disappointed.

For example, when searching for the face of a man who tried to attack a BBC journalist at a Trump rally, Google can find the source of the cropped image, but cannot find any additional images of him, or even someone who bears a passing resemblance to him.

trumprally

trump results google

While Google was not very strong in finding other instances of this man’s face or similar-looking people, it still found the original, un-cropped version of the photograph the screenshot was taken from, showing some utility.

Five Test Cases

For testing out different reverse image search techniques and engines, a handful of images representing different types of investigations are used, including both original photographs (not previously uploaded online) and recycled ones. Due to the fact that these photographs are included in this guide, it is likely that these test cases will not work as intended in the future, as search engines will index these photographs and integrate them into their results. Thus, screenshots of the results as they appeared when this guide was being written are included.

These test photographs include a number of different geographic regions to test the strength of search engines for source material in western Europe, eastern Europe, South America, southeast Asia, and the United States. With each of these photographs, I have also highlighted discrete objects within the image to test out the strengths and weaknesses for each search engine.

Feel free to download these photographs (every image in this guide is hyperlinked directly to a JPEG file) and run them through search engines yourself to test out your skills.

Olisov Palace In Nizhny Novgord, Russia (Original, not previously uploaded online)

test-a-1536x1134.jpg

Isolated: White SUV in Nizhny Novgorod

test-a-suv.jpg

Isolated: Trailer in Nizhny Novgorod

test-a-trailer.jpg

Cityscape In Cebu, Philippines (Original, not previously uploaded online)

test-b-1536x871.jpg

Isolated: Condominium complex, “The Padgett Place

b-toweronly.jpg

Isolated: “Waterfront Hotel

b-tower2only.jpg

Students From Bloomberg 2020 Ad (Screenshot from video)

test-c-1536x1120.jpg

Isolated: Student

c-studentonly.jpg

Av. do Café In São Paulo, Brazil (Screenshot Google Street View)

test-d-1536x691.jpg

Isolated: Toca do Açaí

d-tocadoacai.jpg

Isolated: Estacionamento (Parking)

d-estacionameno-1536x742.jpg

Amsterdam Canal (Original, not previously uploaded online)

test-e-1536x1150.jpg

Isolated: Grey Heron

test-e-bird.jpg

Isolated: Dutch Flag (also rotated 90 degrees clockwise)

test-e-flag.jpg

Results

Each of these photographs were chosen in order to demonstrate the capabilities and limitations of the three search engines. While Yandex in particular may seem like it is working digital black magic at times, it is far from infallible and can struggle with some types of searches. For some ways to possibly overcome these limitations, I’ve detailed some creative search strategies at the end of this guide.

Novgorod’s Olisov Palace

Predictably, Yandex had no trouble identifying this Russian building. Along with photographs from a similar angle to our source photograph, Yandex also found images from other perspectives, including 90 degrees counter-clockwise (see the first two images in the third row) from the vantage point of the source image.

a-results-yandex.jpg

Yandex also had no trouble identifying the white SUV in the foreground of the photograph as a Nissan Juke.

a-results-suv-yandex.jpg

Lastly, in the most challenging isolated search for this image, Yandex was unsuccessful in identifying the non-descript grey trailer in front of the building. A number of the results look like the one from the source image, but none are an actual match.

a-results-trailer-yandex.jpg

Bing had no success in identifying this structure. Nearly all of its results were from the United States and western Europe, showing houses with white/grey masonry or siding and brown roofs.

a-results-bings-1536x725.jpg

Likewise, Bing could not determine that the white SUV was a Nissan Juke, instead focusing on an array of other white SUVs and cars.

a-suvonly-bing-1536x728.jpg

Lastly, Bing failed in identifying the grey trailer, focusing more on RVs and larger, grey campers.

a-trailoronly-bing-1536x730.jpg

Google‘s results for the full photograph are comically bad, looking to the House television show and images with very little visual similarity.

a-results-google-1536x1213.jpg

Google successfully identified the white SUV as a Nissan Juke, even noting it in the text field search. As seen with Yandex, feeding the search engine an image from a similar perspective as popular reference materials — a side view of a car that resembles that of most advertisements — will best allow reverse image algorithms to work their magic.

a-suvonly-google.jpg

Lastly, Google recognized what the grey trailer was (travel trailer / camper), but its “visually similar images” were far from it.

a-trailoronly-google-1536x1226.jpg

Scorecard: Yandex 2/3; Bing 0/3; Google 1/3

Cebu

Yandex was technically able to identify the cityscape as that of Cebu in the Philippines, but perhaps only by accident. The fourth result in the first row and the fourth result in the second row are of Cebu, but only the second photograph shows any of the same buildings as in the source image. Many of the results were also from southeast Asia (especially Thailand, which is a popular destination for Russian tourists), noting similar architectural styles, but none are from the same perspective as the source.

b-results-yandex.jpg

Of the two buildings isolated from the search (the Padgett Palace and Waterfront Hotel), Yandex was able to identify the latter, but not the former. The Padgett Palace building is a relatively unremarkable high-rise building filled with condos, while the Waterfront Hotel also has a casino inside, leading to an array of tourist photographs showing its more distinct architecture.

b-tower1-yandex.jpg

b-tower2-yandex.jpg

Bing did not have any results that were even in southeast Asia when searching for the Cebu cityscape, showing a severe geographic limitation to its indexed results.

b-results-bing-1536x710.jpg

Like Yandex, Bing was unable to identify the building on the left part of the source image.

b-tower1-bing-1536x707.jpg

Bing was unable to find the Waterfront Hotel, both when using Bing’s cropping function (bringing back only low-resolution photographs) and manually cropping and increasing the resolution of the building from the source image. It is worth noting that the results from these two versions of the image, which were identical outside of the resolution, brought back dramatically different results.

b-tower2-bing-1536x498.jpg

b-tower2-bing2-1536x803.jpg

As with Yandex, Google brought back a photograph of Cebu in its results, but without a strong resemblance to the source image. While Cebu was not in the thumbnails for the initial results, following through to “Visually similar images” will fetch an image of Cebu’s skyline as the eleventh result (third image in the second row below).

b-results-google-1536x1077.jpg

As with Yandex and Bing, Google was unable to identify the high-rise condo building on the left part of the source image. Google also had no success with the Waterfront Hotel image.

b-tower1-google-1536x1366.jpg

b-tower2-google-1536x1352.jpg

Scorecard: Yandex 4/6; Bing 0/6; Google 2/6

Bloomberg 2020 Student

Yandex found the source image from this Bloomberg campaign advertisement — a Getty Images stock photo. Along with this, Yandex also found versions of the photograph with filters applied (second result, first row) and additional photographs from the same stock photo series. Also, for some reason, porn, as seen in the blurred results below.

c-results-yandex.jpg

When isolating just the face of the stock photo model, Yandex brought back a handful of other shots of the same guy (see last image in first row), plus images of the same stock photo set in the classroom (see the fourth image in the first row).

c-studentonly-results-yandex.jpg

Bing had an interesting search result: it found the exact match of the stock photograph, and then brought back “Similar images” of other men in blue shirts. The “Pages with this” tab of the result provides a handy list of duplicate versions of this same image across the web.

c-results-bing-1536x702.jpg

c-results-bing2.jpg

Focusing on just the face of the stock photo model does not bring back any useful results, or provide the source image that it was taken from.

c-studentonly-results-bing-1536x721.jpg

Google recognizes that the image used by the Bloomberg campaign is a stock photo, bringing back an exact result. Google will also provide other stock photos of people in blue shirts in class.

c-results-google.jpg

In isolating the student, Google will again return the source of the stock photo, but its visually similar images do not show the stock photo model, rather an array of other men with similar facial hair. We’ll count this as a half-win in finding the original image, but not showing any information on the specific model, as Yandex did.

c-studentonly-results-google.jpg

Scorecard: Yandex 6/8; Bing 1/8; Google 3.5/8

Brazilian Street View

Yandex could not figure out that this image was snapped in Brazil, instead focusing on urban landscapes in Russia.

d-results-yandex.jpg

For the parking sign [Estacionamento], Yandex did not even come close.

d-parking-yandex.jpg

Bing did not know that this street view image was taken in Brazil.

d-results-bing-1536x712.jpg

…nor did Bing recognize the parking sign

d-parking-bing-1536x705.jpg

…or the Toca do Açaí logo.

d-toco-bing-1536x498.jpg

Despite the fact that the image was directly taken from Google’s Street View, Google reverse image search did not recognize a photograph uploaded onto its own service.

d-results-google-1536x1188.jpg

Just as Bing and Yandex, Google could not recognize the Portuguese parking sign.

d-parking-google.jpg

Lastly, Google did not come close to identifying the Toca do Açaí logo, instead focusing on various types of wooden panels, showing how it focused on the backdrop of the image rather than the logo and words.

d-toca-google-1536x1390.jpg

Scorecard: Yandex 7/11; Bing 1/11; Google 3.5/11

Amsterdam Canal

Yandex knew exactly where this photograph was taken in Amsterdam, finding other photographs taken in central Amsterdam, and even including ones with various types of birds in the frame.

e-results-yandex.jpg

Yandex correctly identified bird in the foreground of the photograph as a grey heron (серая цапля), also bringing back an array of images of grey herons in a similar position and posture as the source image.

e-bird-yandex.jpg

However, Yandex flunked the test of identifying the Dutch flag hanging in the background of the photograph. When rotating the image 90 degrees clockwise to present the flag in its normal pattern, Yandex was able to figure out that it was a flag, but did not return any Dutch flags in its results.

e-flag-yandex.jpg

test-e-flag2.jpg

e-flag2-yandex.jpg

Bing only recognized that this image shows an urban landscape with water, with no results from Amsterdam.

e-results-bing-1536x723.jpg

Though Bing struggled with identifying an urban landscape, it correctly identified the bird as a grey heron, including a specialized “Looks like” result going to a page describing the bird.

e-bird-bing-1536x1200.jpg

However, like with Yandex, the Dutch flag was too confusing for Bing, both in its original and rotated forms.

e-flag-bing-1536x633.jpg

e-flag2-bing-1536x491.jpg

Google noted that there was a reflection in the canal of the image, but went no further than this, focusing on various paved paths in cities and nothing from Amsterdam.

e-results-google-1536x1365.jpg

Google was close in the bird identification exercise, but just barely missed it — it is a grey, not great blue, heron.

e-bird-google-1536x1378.jpg

Google was also unable to identify the Dutch flag. Though Yandex seemed to recognize that the image is a flag, Google’s algorithm focused on the windowsill framing the image and misidentified the flag as curtains.

e-flag-google-1536x1374.jpg

e-flag2-google-1536x1356.jpg

Final Scorecard: Yandex 9/14; Bing 2/14; Google 3.5/14

Creative Searching

Even with the shortcomings described in this guide, there are a handful of methods to maximize your search process and game the search algorithms.

Specialized Sites

For one, you could use some other, more specialized search engines outside of the three detailed in this guide. The Cornell Lab’s Merlin Bird ID app, for example, is extremely accurate in identifying the type of birds in a photograph, or giving possible options. Additionally, though it isn’t an app and doesn’t let you reverse search a photograph, FlagID.org will let you manually enter information about a flag to figure out where it comes from. For example, with the Dutch flag that even Yandex struggled with, FlagID has no problem. After choosing a horizontal tricolor flag, we put in the colors visible in the image, then receive a series of options that include the Netherlands (along with other, similar-looking flags, such as the flag of Luxembourg).

flagsearch1.jpgflagsearch2.jpg

Language Recognition

If you are looking at a foreign language with an orthography you don’t recognize, try using some OCR or Google Translate to make your life easier. You can use Google Translate’s handwriting tool to detect the language* of a letter that you hand-write, or choose a language (if you know it already) and then write it out yourself for the word. Below, the name of a cafe (“Hedgehog in the Fog“) is written out with Google Translate’s handwriting tool, giving the typed-out version of the word (Ёжик) that can be searched.

*Be warned that Google Translate is not very good at recognizing letters if you do not already know the language, though if you scroll through enough results, you can find your handwritten letter eventually.

yozhikvtumane.jpg

yozhik-1536x726.jpg

yozhik2-1536x628.jpg

Pixelation And Blurring

As detailed in a brief Twitter thread, you can pixelate or blur elements of a photograph in order to trick the search engine to focus squarely on the background. In this photograph of Rudy Giuliani’s spokeswoman, uploading the exact image will not bring back results showing where it was taken.

2019-12-16_14-55-50-1536x1036.jpg

However, if we blur out/pixelate the woman in the middle of the image, it will allow Yandex (and other search engines) to work their magic in matching up all of the other elements of the image: the chairs, paintings, chandelier, rug and wall patterns, and so on.

blurtest.jpg

After this pixelation is carried out, Yandex knows exactly where the image was taken: a popular hotel in Vienna.

yandexresult.jpg

2019-12-16_15-02-32.jpg

Conclusion

Reverse image search engines have progressed dramatically over the past decade, with no end in sight. Along with the ever-growing amount of indexed material, a number of search giants have enticed their users to sign up for image hosting services, such as Google Photos, giving these search algorithms an endless amount of material for machine learning. On top of this, facial recognition AI is entering the consumer space with products like FindClone and may already be used in some search algorithms, namely with Yandex. There are no publicly available facial recognition programs that use any Western social network, such as Facebook or Instagram, but perhaps it is only a matter of time until something like this emerges, dealing a major blow to online privacy while also (at that great cost) increasing digital research functionality.

If you skipped most of the article and are just looking for the bottom line, here are some easy-to-digest tips for reverse image searching:

  • Use Yandex first, second, and third, and then try Bing and Google if you still can’t find your desired result.
  • If you are working with source imagery that is not from a Western or former Soviet country, then you may not have much luck. These search engines are hyper-focused on these areas, and struggle for photographs taken in South America, Central America/Caribbean, Africa, and much of Asia.
  • Increase the resolution of your source image, even if it just means doubling or tripling the resolution until it’s a pixelated mess. None of these search engines can do much with an image that is under 200×200.
  • Try cropping out elements of the image, or pixelating them if it trips up your results. Most of these search engines will focus on people and their faces like a heat-seeking missile, so pixelate them to focus on the background elements.
  • If all else fails, get really creative: mirror your image horizontally, add some color filters, or use the clone tool on your image editor to fill in elements on your image that are disrupting searches.

[Source: This article was published in bellingcat.com By Aric Toler - Uploaded by the Association Member: Issac Avila] 

Categorized in Investigative Research

The internet is an iceberg. And, as you might guess, most of us only reckon with the tip. While the pages and media found via simple searches may seem unendingly huge at times, what is submerged and largely unseen – often referred to as the invisible web or deep web – is in fact far, far bigger.

THE SURFACE WEB

What we access every day through popular search engines like Google, Yahoo or Bing is referred to as the Surface Web. These familiar search engines crawl through tens of trillions of pages of available content (Google alone is said to have indexed more than 30 trillion web pages) and bring that content to us on demand. As big as this trove of information is, however, this represents only the tip of the iceberg.

Eric Schmidt, the CEO of Google, was asked to estimate the size of the World Wide Web. He estimated that of roughly 5 million terabytes of data, Google has indexed roughly 200 terabytes, or only .004% of the total internet.

THE INVISIBLE WEB

Beneath the Surface Web is what is referred to as the Deep or Invisible Web. It is comprised of:

  • Private websites, such as VPN (Virtual Private networks) and sites that require passwords and logins
  • Limited access content sites (which limit access in a technical way, such as using Captcha, Robots Exclusion Standard or no-cache HTTP headers that prevent search engines from browsing or caching them)
  • Unlinked content, without hyperlinks to other pages, which prevents web crawlers from accessing information
  • Textual content, often encoded in image or video files or in specific file formats not handled by search engines
  • Dynamic content created for a single purpose and not part of a larger collection of items
  • Scripted content, pages only accessible using Java Script, as well as content downloaded using Flash and Ajax solutions

There are many high-value collections to be found within the invisible web. Some of the material found there that most people would recognize and, potentially, find useful include:

  • Academic studies and papers
  • Blog platforms
  • Pages created but not yet published
  • Scientific research
  • Academic and corporate databases
  • Government publications
  • Electronic books
  • Bulletin boards
  • Mailing lists
  • Online card catalogs
  • Directories
  • Many subscription journals
  • Archived videos
  • Images

But knowing all these materials are out there, buried deep within the web doesn't really help the average user. What tools can we turn to in order to make sense of the invisible web? There really is no easy answer. Sure, the means to search and sort through massive amounts of invisible web information are out there, but many of these tools have an intense learning curve. This can mean sophisticated software that requires no small amount of computer savvy; it can mean energy-sucking search tools that require souped up computers to handle the task of combing through millions of pages of data; or, it can require the searching party to be unusually persistent – something most of us, with our expectations of instantaneous Google search success, won't be accustomed to.

All that being said, we can become acquainted with the invisible web by degrees. The many tools considered below will help you access a sizable slice of the invisible web's offerings. You will find we've identified a number of subject-specific databases and engines; tools with an established filter, making their searches much more narrow.

OPEN ACCESS JOURNAL DATABASES

Open access journal databases (OAJD) are compilations of free scholarly journals maintained in a manner that facilitates access by researchers and others who are seeking specific information or knowledge. Because these databases are comprised of unlinked content, they are located in the invisible web.

The vast majority of these journals are of the highest quality, with peer reviews and extensive vetting of the content before publication. However, there has been a trend of journals that are accepting scholarship without adequate quality controls, and with arrangements designed to make money for the publishers rather than furtherance of scholarship. It is important to be careful and review the standards of the database and journals chosen. "This helpful guide" explains what to look for.

Below is a sample list of well-regarded and reputable databases.

  • "AGRIS" (International Information System for Agricultural Science and Technology) is a global, public domain database maintained in multiple languages by the Food and Agriculture Organization of the United Nations. They provide free access to agricultural research and information.
  • "BioMed Central" is the UK-based publisher of 258 peer-reviewed open access journals. Their published works span science, technology and medicine and include many well-regarded titles.
  • "Copernicus Publications" has been an open-access scientific publisher in Germany since 2001. They are strong supporters of the researchers who create these articles, providing top-level peer review and promotion for their work.
  • "DeGruyter Open" (formerly Versita Open) is one of Germany's leading publishers of open access content. Today DeGruyter Open (DGO) publishes about 400 owned and third-party scholarly journals and books across all major disciplines.
  • "Directory of Open Access Journals is focused on providing access only to those journals that employ the highest quality standards to guarantee content. They are presently a repository of 9,740 journals with more than 1.5 million articles from 133 countries.
  • "EDP Sciences" (Édition Diffusion Presse Sciences) is a France-based scientific publisher with an international mission. They publish more than 50 scientific journals, with some 60,000 published pages annually.
  • "Elsevier of Amsterdam is a world leader in advancing knowledge in the science, technology and health fields. They publish nearly 2,200 journals, including The Lancet and Cell, and over 25,000 book titles, including Gray's Anatomy and Nelson' s Pediatrics.
  • "Hindawi Publishing Corporation", based in Egypt, publishes 434 peer-reviewed, open access journals covering all areas of Science, Technology and Medicine, as well as a variety of Social Sciences.
  • "Journal Seek" (Genamics) touts itself as "the largest completely categorized database of freely available journal information available on the internet," with more than 100,000 titles currently. Categories range from Arts and Literature, through both hard- and soft-sciences, to Sports and Recreation.
  • "The Multidisciplinary Digital Publishing Institute" (MDPI), based in Switzerland, is a publisher of more than 110 peer-reviewed, open access journals covering arts, sciences, technology and medicine.
  • "Open Access Journals Search Engine" (OAJSE), based in India, is a search engine for open access journals from throughout the world, except for India. An extremely simple interface. Note: the site was last updated June 21, 2013.
  • "Open J-Gate" is an India-based e-journal database of millions of journal articles in open access domain. With a worldwide reach, Open J-Gate is updated every day with new academic, research and industry articles.
  • "Open Science Directory" contains about 13,000 scientific journals, with another 7,000 special programs titles.
  • "Springer Open" offers a roster of more than 160 peer-reviewed, open access journals, as well as their more recent addition of free access books, covering all scientific disciplines.
  • "Wiley Open Access", a subsidiary of New Jersey-based global publishers John Wiley & Sons, Inc., publishes peer reviewed open access journals specific to biological, chemical and health sciences.

INVISIBLE WEB SEARCH ENGINES

Your typical search engine's primary job is to locate the surface sites and downloads that make up much of the web as we know it. These searches are able to find an array of HTML documents, video and audio files and, essentially, any content that is heavily linked to or shared online. And often, these engines, Google chief among them, will find and organize this diversity of content every time you search.

The search engines that deliver results from the invisible web are distinctly different. Narrower in scope, these deep web engines tend to access only a single type of data. This is due to the fact that each type of data has the potential to offer up an outrageous number of results. An inexact deep web search would quickly turn into a needle in a haystack. That's why deep web searches tend to be more thoughtful in their initial query requirements.
Below is a list of popular invisible web search engines:

  • "Clusty" is a meta search engine that not only combines data from a variety of different source documents, but also creates "clustered" responses, automatically sorting by category.
  • "CompletePlanet" searches more than 70,000 databases and specialty search engines found only in the invisible web. A search engine as well-suited to casual searchers as it is to researchers.
  • "DigitalLibrarian": A Librarian's Choice of the Best of the Web is maintained by a real librarian. With an eclectic mix of some 45 broad categories, Digital Librarian offers data from categories as diverse as Activism/Non Profits and Railroads and Waterways.
  • "InfoMine" is another librarian-developed internet resource collection, this time from The Regents of the University of California.
  • "InternetArchive" has an eclectic array of categories, starting with the ‘Wayback Machine,' which allows the searcher to locate archived documents, and including an archive of Grateful Dead audience and soundboard recordings. They offer 6 million texts, 1.5 million videos, 1.9 million audio recordings and 126K live music concerts.
  • "The Internet Public Library" (ipl and ipl2) is a non-profit, student-run website at Drexel University. Students volunteer to act as librarians and respond to questions from visitors. Categories of data include those directed to Children and Teens.
  • "SurfWax" is a metasearch engine that offers "practical tools for Dynamic Search Navigation." It offers the option of grabbing results from multiple search engines at the same time, or even designing "SearchSets," which are individualized groups of sources that can be used over and over in searches.
  • "UC Santa Barbara Library" offers access to a diverse group of research databases useful to students, researchers and the casual searcher. It should be noted that many of these resources are password protected. Those that do not display a lock icon are publicly accessible.
  • "USA.gov" offers acess to a huge volume of information, including all types of forms, databases, and information sites representing most government agencies.
  • "Voice of the Shuttle" (VoS) offers access to a diverse assortment of sites, including literature, literary theory, philosophy, history and cultural studies, and includes the daily update of all things "cool."

SUBJECT -SPECIFIC DATABASES

The following lists pool together some mainstream and not so mainstream databases dedicated to particular fields and areas of interest. While only a handful of these tools are able to surface deep web materials, all of the search engines and collections we have highlighted are powerful, extensive bodies of work. Many of the resources these tools surface would likely be overlooked if the same query were made on one of the mainstream engines most users fall back on, like Bing, Yahoo and even Google.

Art & Design

  • "ArtNet" deals with pricing and sourcing work in the art market. They also keep track of the latest news and artists in the industry.
  • "The Metropolitan Museum of Art" site hosts an impressively interactive body of information on their collections, exhibitions, events and research.
  • "Musée du Louvre", the renowned museum, maintains a site filled with navigable sections covering its collections.
  • "The National Gallery of Art" premier museum of arts in our nation's capital, also maintains a site detailing the highlights, exhibitions and education efforts the institution oversees.
  • "Public Art Online" is a resource detailing sources, creators, prices, projects, legal issues, success stories, resources, education and all other aspects of the creation of public art.
  • "Smithsonian Art Inventories Catalog" is a subset of the Smithsonian Institution Research Information System (SIRIS). A browsable database of over 400,000 art inventory items held in public and private collections.
  • "Web Gallery of Art" is a searchable database of European art, containing nearly 34,000 reproductions. Additional database information includes artist biographies, period music and commentaries.

Business

  • "Better Business Bureau" (BBB) Information System Search allows consumers to locate the details of ratings, consumer experience, governmental action and more of both BBB accredited and non-accredited businesses.
  • "BPubs.com" is the business publications search engine. They offer more than 200 free subscriptions to business and trade publications.
  • "BusinessUSA" is an excellent and complete database of everything a new or experienced business owner or employer should know.
  • "EDGAR: U.S. Securities and Exchange Commission" contains a database of Securities and Exchange Commission. Posts copies of corporate filings from US businesses, press releases and public statements.
  • "Global Edge" delivers a comprehensive research tool for academics, students and businesspeople to seek out answers to international business questions.
  • "Hoover's", a subsidiary of Dun & Bradstreet, is one of the best known databases of American and International business. A complete source of company and industry information, especially useful for investors.
  • "The National Bureau of Economic Research is perhaps the leading private, non-partisan research organization dedicated to unbiased analysis of economic policy. This database maintains archives of research data, meetings, activities, working papers and publications.
  • "U.S. Department of Commerce", Bureau of Economic Analysis is the source of many of the economic statistics we hear in the news, including national income and product accounts (NIPAs), gross domestic product, consumer spending, balance of payments and much more.

Legal & Social Services

Science & Technology

  • "Environmental Protection Agency" rganizes the agency's laws and regulations, science and technology, and the many issues affecting the agency and its policies.
  • "National Science Digital Library" (NSDL) is a source for science, technology, engineering and mathematics educational data. It is funded by the National Science Foundation.
  • "Networked Computer Science Technical Reports Library (NCSTRL) was developed as a collaborative effort between NASA Langley, Virginia Tech, Old Dominion University and University of Virginia. It serves as an archive for submitted scientific abstracts and other research products.
  • "Science.gov" is a compendium of more than 60 US government scientific databases and more than 200 websites. Governed by the interagency Science.gov Alliance, this site provides access to a range of government scientific research data.
  • "Science Research" is a free, publicly available deep web search engine that purports to use a sophisticated technology that permits queries to more than 300 science and technology sites simultaneously, with the results collated, ranked and stripped of duplications.
  • "WebCASPAR" provides access to science and engineering data from a variety of US educational institutions. It incorporates a table builder, allowing a combined result from various National Science Foundation and National Center for Education Statistics data sources.
  • "WebCASPAR" World Wide Science is a global scientific gateway, comprised of US and international scientific databases. Because it is multilingual, it allows real-time search and translation of reporting from an extensive group of databases.

Healthcare

  • "Cases Database" is a searchable database of more than 32,000 peer-reviewed medical case reports from 270 journals covering a variety of medical conditions.
  • "Center for Disease Control" (CDC) WONDER's online databases permit access to the substantial public health data resources held by the CDC.
  • "HCUPnet" is an online query system for those seeking access to statistical data from the Agency for Healthcare Research and Quality.
  • "Healthy People" provides rolling 10-year national objectives and programs for improving the health of Americans. They currently operate under the Healthy People 2020 decennial agenda.
  • "National Center for Biotechnology Information" (NCBI) is an offshoot of the National Institutes of Health (NIH). This site provides access to some 65 databases from the various project categories currently being researched.
  • "OMIM" offers access to the combined research of many decades into genetics and genetic disorders. With daily updates, it represents perhaps the most complete single database of this sort of data.
  • "PubMed is a database of more than 23 million citations from the US National Library of Medicine and National Institutes of Health.
  • "TOXNET" is the access portal to the US Toxicology Data Network, an offshoot of the National Library of Medicine.
  • "U.S. National Library of Medicine" is a database of medical research, available grants, available resources. The site is maintained by the National Institutes of Health.
  • "World Health Organization" (WHO) is a comprehensive site covering the many initiatives the WHO is engaged in around the world.

[Source: This article was published in onlineuniversities.com By hilip Bump - Uploaded by the Association Member: Robert Hensonw]

Categorized in How to

Google to offer users the option to auto-delete location history and web search data that it harvests

Google is to give users the choice of being able to automatically delete their search and location history after three months.

It announced the auto-delete tools for location history data, as well as web browsing and app activity, which will be rolled out in the coming weeks.

Last November Google was accused of misleading about location tracking after consumer groups from seven European nations asked their privacy regulators to take action against the search engine giant.

google logo mountainview 011

Location tracking

Consumer groups from the Netherlands, Poland, Czech Republic, Greece, Norway, Slovenia and Sweden, all filed GDPR complaints against Google’s location tracking.

They alleged that Google is tracking the movements of millions of users in breach of the European Union’s privacy laws.

Google, of course, is already facing a lawsuit in the United States for allegedly tracking phone users regardless of privacy settings.

That lawsuit was filed after an investigation by the Associated Press found that a number of Google services running on Android and Apple devices determine the user’s location and store it, even when Google’s “Location History” setting is switched off.

It should be remembered that Google had already allowed users to manually delete the data it harvests when they use its products such as YouTube, Maps and Search.

But now it trying to give users more control by offering auto-delete tools.

“And when you turn on settings like Location History or Web & App Activity, the data can make Google products more useful for you – like recommending a restaurant that you might enjoy, or helping you pick up where you left off on a previous search,” wrote David Monsees, product manager of search in a blog posting.

Auto-delete

“We work to keep your data private and secure, and we’ve heard your feedback that we need to provide simpler ways for you to manage or delete it,” Monsees added.

“You can already use your Google Account to access simply on/off controls for Location History and Web & App Activity, and if you choose – to delete all or part of that data manually,” he wrote. “In addition to these options, we’re announcing auto-delete controls that make it even easier to manage your data.”

Essentially, the user will give a time limit to choose for how long you want your data to be saved. This could be 3 months or 18 months.

Any data older than that will be automatically deleted from your account on an ongoing basis.

“These controls are coming first to Location History and Web & App Activity and will roll out in the coming weeks,” Monsees wrote. “You should always be able to manage your data in a way that works best for you–and we’re committed to giving you the best controls to make that happen.”

It should be noted that there will be no auto-delete of YouTube watch history or voice commands issued via Home and Assistant.

[Source: This article was published in silicon.co.uk By Tom Jowitt - Uploaded by the Association Member: James Gill]

Categorized in Search Engine

Ever had to search for something on Google, but you’re not exactly sure what it is, so you just use some language that vaguely implies it? Google’s about to make that a whole lot easier.

Google announced today it’s rolling out a new machine learning-based language understanding technique called Bidirectional Encoder Representations from Transformers, or BERT. BERT helps decipher your search queries based on the context of the language used, rather than individual words. According to Google, “when it comes to ranking results, BERT will help Search better understand one in 10 searches in the U.S. in English.”

Most of us know that Google usually responds to words, rather than to phrases — and Google’s aware of it, too. In the announcement, Pandu Nayak, Google’s VP of search, called this kind of searching “keyword-ese,” or “typing strings of words that they think we’ll understand, but aren’t actually how they’d naturally ask a question.” It’s amusing to see these kinds of searches — heck, Wired has made a whole cottage industry out of celebrities reacting to these keyword-ese queries in their “Autocomplete” video series” — but Nayak’s correct that this is not how most of us would naturally ask a question.

As you might expect, this subtle change might make some pretty big waves for potential searchers. Nayak said this “[represents] the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search.” Google offered several examples of this in action, such as “Do estheticians stand a lot at work,” which apparently returned far more accurate search results.

I’m not sure if this is something most of us will notice — heck, I probably wouldn’t have noticed if I hadn’t read Google’s announcement, but it’ll sure make our lives a bit easier. The only reason I can see it not having a huge impact at first is that we’re now so used to keyword-ese, which is in some cases more economical to type. For example, I can search “What movie did William Powell and Jean Harlow star in together?” and get the correct result (Libeled Lady; not sure if that’s BERT’s doing or not), but I can also search “William Powell Jean Harlow movie” and get the exact same result.

BERT will only be applied to English-based searches in the US, but Google is apparently hoping to roll this out to more countries soon.

[Source: This article was published in thenextweb.com By RACHEL KASER - Uploaded by the Association Member: Dorothy Allen]

Categorized in Search Engine

The new language model can think in both directions, fingers crossed

Google has updated its search algorithms to tap into an AI language model that is better at understanding netizens' queries than previous systems.

Pandu Nayak, a Google fellow and vice president of search, announced this month that the Chocolate Factory has rolled out BERT, short for Bidirectional Encoder Representations from Transformers, for its most fundamental product: Google Search.

To pull all of this off, researchers at Google AI built a neural network known as a transformer. The architecture is suited to deal with sequences in data, making them ideal for dealing with language. To understand a sentence, you must look at all the words in it in a specific order. Unlike previous transformer models that only consider words in one direction – left to right – BERT is able to look back to consider the overall context of a sentence.

“BERT models can, therefore, consider the full context of a word by looking at the words that come before and after it—particularly useful for understanding the intent behind search queries,” Nayak said.

For example, below's what the previous Google Search and new BERT-powered search looks like when you query: “2019 brazil traveler to usa need a visa.”

2019 brazil

Left: The result returned for the old Google Search that incorrectly understands the query as a US traveler heading to Brazil. Right: The result returned for the new Google Search using BERT, which correctly identifies the search is for a Brazilian traveler going to the US. Image credit: Google.

BERT has a better grasp of the significance behind the word "to" in the new search. The old model returns results that show information for US citizens traveling to Brazil, instead of the other way around. It looks like BERT is a bit patchy, however, as a Google Search today still appears to give results as if it's American travelers looking to go to Brazil:

current google search

Current search result for the query: 2019 brazil traveler to USA need a visa. It still thinks the sentence means a US traveler going to Brazil

The Register asked Google about this, and a spokesperson told us... the screenshots were just a demo. Your mileage may vary.

"In terms of not seeing those exact examples, the side-by-sides we showed were from our evaluation process, and might not 100 percent mirror what you see live in Search," the PR team told us. "These were side-by-side examples from our evaluation process where we identified particular types of language understanding challenges where BERT was able to figure out the query better - they were largely illustrative.

"Search is dynamic, content on the web changes. So it's not necessarily going to have a predictable set of results for any query at any point in time. The web is constantly changing and we make a lot of updates to our algorithms throughout the year as well."

Nayak claimed BERT would improve 10 percent of all its searches. The biggest changes will be for longer queries, apparently, where sentences are peppered with prepositions like “for” or “to.”

“BERT will help Search better understand one in 10 searches in the US in English, and we’ll bring this to more languages and locales over time,” he said.

Google will run BERT on its custom Cloud TPU chips; it declined to disclose how many would be needed to power the model. The most powerful Cloud TPU option currently is the Cloud TPU v3 Pods, which contain 64 ASICs, each carrying performance of 420 teraflops and 128GB of high-bandwidth memory.

At the moment, BERT will work best for queries made in English. Google said it also works in two dozen countries for other languages, too, such as Korean, Hindi, and Portuguese for “featured snippets” of text. ®

[Source: This article was published in theregister.co.uk By Katyanna Quach - Uploaded by the Association Member: Anthony Frank]

Categorized in Search Engine

John Mueller from Google gave one of the clearest and easiest to understand explanations on how Google uses machine learning in web search. He basically said Google uses it for "specific problems" where automation and machine learning can help improve the outcome. The example he gave was with canonicalization and the example clears things up.

This is from the Google webmaster hangout starting at 37:47 mark. The example is this "So, for example, we use machine learning for canonicalization. So what that kind of means is we have all of those factors that we talked about before. And we give them individual weights. That's kind of the traditional way to do it. And we say well rel canonical has this much weight and redirect has this much weight and internal linking has this much weight. And the traditional approach would be to say well we will just make up those weights, at those numbers and see if it works out. And if we see that things don't work out we will tweak those numbers a little bit. And with machine learning what we can essentially do is say well this is the outcome that we want to have achieved and machine learning algorithms should figure out these weights on their own."

This was the first part of the answer around how Google debugs its search algorithm.

Here is the full transcript of this part.

The question:

Machine learning has been a part of Google search algorithm and I can imagine it's getting smarter every day. Do you as an employee with access to the secret files know the exact reason why pages rank better than others or is the algorithm now making decisions and evolving in a way that makes it impossible for humans to understand?

John's full answer:

We get this question every now and then and we're not allowed to could provide an answer because the machines are telling us not to talk about this topic. So it's I really can't answer. No just kidding.

It's something where we use machine learning in lots of ways to help us understand things better. But machine learning isn't just this one black box that does everything for you. Like you feed the internet in on one side the other side comes out search results. It's a tool for us. It's essentially a way of testing things out a lot faster and trying things out figuring out what the right solution there is.

So, for example, we use machine learning for canonicalization. So what that kind of means is we have all of those factors that we talked about before. And we give them individual weights. That's kind of the traditional way to do it. And we say well rel canonical has this much weight and redirect has this much weight and internal linking has this much weight. And the traditional approach would be to say well we will just make up those weights, at those numbers and see if it works out. And if we see that things don't work out we will tweak those numbers a little bit. And with machine learning what we can essentially do is say well this is the outcome that we want to have achieved and machine learning algorithms should figure out these weights on their own.

So it's not so much that machine learning does everything with canonicalization on its own but rather it has this well-defined problem. It's working out like what are these numbers that we should have there as weights and kind of repeatedly trying to relearn that system and understanding like on the web this is how people do it and this is where things go wrong and that's why we should choose these numbers.

So when it comes to debugging that. We still have those numbers, we still have those weights there. It's just that they're determined by machine learning algorithms. And if we see that things go wrong then we need to find a way like how could we tell the machine learning algorithm actually in this case we should have taken into account, I don't know phone numbers on a page more rather than just the pure content, to kind of separate like local versions for example. And that's something that we can do when we kind of train these algorithms.

So with all of this machine learning things, it's not that there's one black box and it just does everything and nobody knows why it does things. But rather we try to apply it to specific problems where it makes sense to automate things a little bit in a way that saves us time and that helps to pull out patterns that maybe we wouldn't have recognized manually if we looked at it.

Here is the video embed:

{youtube}5QxYWMEZT3A{/youtube}

Here is how Glenn Gabe summed it up on Twitter:

Glenn Gabe@glenngabe
Glenn Gabe@glenngabe

More from @johnmu: Machine learning helps us pull out patterns we might have missed. And for debugging, Google can see those weights which are determined by ML algos. If there is something that needs to be improved, Google can work to train the algorithms: https://www.youtube.com/watch?v=5QxYWMEZT3A&t=38m53s 

[Source: This article was published in seroundtable.com By Barry Schwartz - Uploaded by the Association Member: Robert Hensonw]

Categorized in Search Engine

Google has seemingly put the final nail in the coffin for Adobe Flash, the once-popular video and animation player that's become less relevant as newer web standards like HTML5 have taken over.

The company announced on Monday that its search engine will stop supporting Flash later this year, and that it will ignore Flash content in websites that contain it. The search engine will also stop indexing SWF files, the file format for media played through the Flash Player. Google noted that most users and websites won't see any impact from this change. 

The move has been a long time coming for Flash. Adobe announced in 2017 that it was planning to end-of-life Flash by ceasing to update and distribute it at the end of 2020, and Flash is already disabled in Chrome by default. When it made the announcement, Adobe said it was working with partners like Apple, Microsoft, Facebook, Google, and Mozilla to smoothly phase out Flash.

Flash was once a critical technology that enabled content creators to easily implement media, animations, and games  in their websites during the earlier days of the web. If you frequently played online games in your web browser in the early 2000s, you'll probably remember that Flash plugin was a necessity. 

But as new web standards like HTML5 and WebGL have risen in popularity, there became less of a need for Flash. Plus, as time went on, Flash became more prone to security concerns — including one vulnerability highlighted by security blog Naked Security which surfaced last year that would have made it possible for hackers to execute malicious code via a Flash file.

[Source: This article was published in businessinsider.com By Lisa Eadicicco - Uploaded by the Association Member: David J. Redcliff] 

Categorized in Search Engine

Don't try to optimize for BERT, try to optimize your content for humans.

Google introduced the BERT update to its Search ranking system last week. The addition of this new algorithm, designed to better understand what’s important in natural language queries, is a significant change. Google said it impacts 1 in 10 queries. Yet, many SEOs and many of the tracking tools did not notice massive changes in the Google search results while this algorithm rolled out in Search over the last week.

The question is, Why?

The short answer. This BERT update really was around understanding “longer, more conversational queries,” Google wrote in its blog post. The tracking tools, such as Mozcast and others, primarily track shorter queries. That means BERT’s impact is less likely to be visible to these tools.

And for site owners, when you look at your rankings, you likely not tracking a lot of long-tail queries. You track queries that send higher volumes of traffic to your web site, and those tend to be short-tail queries.

Moz on BERT. Pete Meyers of Moz said the MozCast tool tracks shorter head terms and not the types of phrases that are likely to require the natural language processing (NLP) of BERT.

dr.pete

RankRanger on BERT. The folks at RankRanger, another toolset provider told me something similar. “Overall, we have not seen a real ‘impact’ — just a few days of slightly increased rank fluctuations,” the company said. Again, this is likely due to the dataset these companies track — short-tail keywords over long -tail keywords.

Overall tracking tools on BERT. If you look at the tracking tools, virtually all of them showed a smaller level of fluctuation on the days BERT was rolling out compared to what they have shown for past Google algorithm updates such as core search algorithm updates, or the Panda and Penguin updates.

Here are screenshots of the tools over the past week. Again, you would see significant spikes in changes, but these tools do not show that:

mozcast 800x348

serpmetrics 800x308

algoroo 800x269

advancedwebranking 800x186

accuranker 800x245

rankranger 800x265

semrush 800x358

SEO community on BERT. When it comes to individuals picking up on changes to their rankings in Google search, that also was not as large as a Google core update. We did notice chatter throughout the week, but that chatter within the SEO community was not as loud as is typical with other Google updates.

Why we care. We are seeing a lot of folks asking about how they can improve their sites now that BERT is out in the wild. That’s not the way to think about BERT. Google has already stated there is no real way to optimize for it. Its function is to help Google better understand searchers’ intent when they search in natural language. The upside for SEOs and content creators is they can be less concerned about “writing for the machines.” Focus on writing great content — for real people.

Danny Sullivan from Google said again, you cannot really optimize for BERT:

johan

Continue with your strategy to write the best content for your users. Don’t do anything special for BERT, but rather, be special for your users. If you are writing for people, you are already “optimizing” for Google’s BERT algorithm.

[Source: This article was published in searchengineland.com By Barry Schwartz - Uploaded by the Association Member: Joshua Simon]

Categorized in Search Engine

On a Google Webmaster Hangout someone asked about the role of H1s on a web page. John Mueller responded that heading tags were good for several reasons but they’re not a critical element.

SEO and H1 Headings

One of the top rules for Search Engine Optimization has long been adding keywords to your H1 heading at the top of the page in order to signal what a page is about and rank well.

It used to be the case, in the early 2000’s. that adding the target keyword phrase in the H1 was mandatory. In the early 2000’s, if the keywords were not in the H1 heading then your site might not be so competitive.

However, Google’s ability to understand the nuances of what a page is about have come a long way since the early 2000’s.

As a consequence, it is important to listen to what Google’s John Mueller says about H1 headings.

Can Multiple H1s be Used?

The context of the question is whether a publisher is restricted to using one H1 or can multiple H1 heading tags be used.

This is the question:

“Is it mandatory to just have one H1 tag on a web page or can it be used multiple times?”

Google’s John Mueller answered that you can use as many H1s as you want. He also said you can omit using the H1 heading tag, too.

John Mueller’s answer about H1 heading tags:

“You can use H1 tags as often as you want on a page. There’s no limit, neither upper or lower bound.”

Then later on, at the end of his answer, he reaffirmed that publishers are free to choose how they want to use the H1 heading tag:

“Your site is going to rank perfectly fine with no H1 tags or with five H1 tags.”

H1 Headings Useful for Communicating Page Structure

John Mueller confirmed that H1 headings are good for outlining the page structure.

What he means is that the heading elements can work together to create a top level outline of what your page is about. That’s a macro overview of what the web page is about.

In my opinion, a properly deployed heading strategy can be useful for communicating what a page is about.

The W3c, the official body that administers HTML guidelines, offers an HTML validator that shows you the “outline” of a web page.

When validating a web page, select the “Show Outline” button. It’s a great way to see a page just by the outline that your heading elements create.

show outline
Choosing the “Show Outline” option in the W3C HTML Validator will show you the overview of what your page looks like as communicated by your heading elements. It’s a great way to get a high level snapshot view of your page structure.

Here are Mueller’s comments about the H1 heading element:

“H1 elements are a great way to give more structure to a page so that users and search engines can understand which parts of a page are kind of under different headings.

So I would use them in the proper way on a page. And especially with HTML5 having multiple H1 elements on a page is completely normal and kind of expected.”

H1 Headings and SEO

John Mueller went on to reaffirm that the lack of a headings or using many H1s was not something to worry about. This is likely due to Google doesn’t need or require H1 headings to rank a web page.

This should be obvious to anyone who works in digital marketing. Google’s search results are full of web pages that do not feature H1 headings or that use them for styling purposes (a misuse of the heading tag!).

There are correlation studies that say that XX percentage of top ranked sites use headings. But those studies ignore that modern web pages, particularly those that use WordPress templates, routinely use Headings for styling navigational elements, which will skew those correlation studies.

Work smarter and boost your PPC performance.
Manage and optimise your online advertising with an award-winning platform. Eclipse your competition, automate your workload, and win with Adzooma.

Here’s what Mueller observed:

“So it’s not something you need to worry about.

Some SEO tools flag this as an issue and say like Oh you don’t have any H1 tag or you have two H1 tags… from our point of view that’s not a critical issue.”

H1 Headings Useful for Usability

Mueller’s on a roll in this answer when he begins talking about heading tags in the context of usability.

I have found that, particularly for mobile, heading tags help make a web page easier to read. Properly planned headings help communicate what a web page is about to a user and visually helps break up a daunting page of text, making it easier to read.

Here’s what Mueller said:

“From a usability point of view maybe it makes sense to improve that. So it’s not that I would completely ignore those suggestions but I wouldn’t see it as a critical issue.”

Takeaways about Heading Tags

  1.  Use as many H1 heading elements as you like
  2. They are useful for communicating page structure to users and Google
  3. Heading elements are useful for usability

Updated: About Mueller’s Response

I read some feedback on Facebook that was critical of Mueller’s response. Some felt that he should have addressed more than just H1.

I believe that Mueller’s response should be seen in the context of the question that was asked. He was asked a narrow question about the H1 element and he answered it.

Technically, Mueller’s answer is correct. He answered the question that was put to him.  So I think  John should be given the benefit of that consideration.

However, I understand why some may say he should have addressed the underlying reason for the question. The person asking the question likely does not understand the proper use of heading elements.

If the person knew the basics of the use of heading elements, they wouldn’t have asked if it’s okay to drop H1 elements all over a web page. So that may have needed to be addressed.

Again, not criticizing Mueller, the context of his answer was focused on H1 elements.

The Proper Use of Heading Elements

I would add that the proper use of all the heading elements from (for example) H1 to H4 is useful. Nesting article sub-topics by using H2, H3 and sometimes H4 can be useful for making it clearer what a page is about.

The benefits of properly using H1 through H4 (your choice!) in the proper way will help communicate what the page is about which is good for bots and humans and will increase usability because it’s easier to read on mobile.

One way to do it is to use H1 for the main topic of the page then every subtopic of that main topic can be wrapped in an H2 heading element. That’s what I did on this article.

Should one of the subtopics itself diverge into a subtopic of itself, then I would use an H3.
Screenshot 1

 

 

 

 

 

 

 

 

Heading Elements and Accessibility

The heading elements also play an important role with making a web page accessible to site visitors who use assistive devices to access web content.

ADA Compliance consultant, Kim Krause Berg, offered these insights from the point of view of accessibility:

We use one H1 tag at the top to indicate the start of the content for assistive devices and organize the remainder from(H2-H6)similarly to how an outline would appear.

 The hierarchy of content is important for screen readers because it indicates the relationship of the content to the other parts of content.
Content under headings should relate to the heading. A bad sequence would be starting out with an(H3, then H1) 

Heading Elements are More than a Place for Keywords

Keyword dumping the heading tags can mask the irrelevance of content. When you stop thinking of heading tags as places to dump your keywords and start using them as headings that communicate what that section of the page is about, you’ll begin seeing what your page is really about. If you don’t like what you see you can rewrite it.

If in doubt, run your URL through the W3C HTML Validator to see how your outline looks!

Watch the Webmaster Hangout here:
https://youtu.be/rwpwq8Ynf7s?t=1427

[Source: This article was published in searchenginejournal.com By Roger Montti - Uploaded by the Association Member: Robert Hensonw]

Categorized in Search Engine
Page 1 of 57

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media

Book Your Seat for Webinar - GET 70% OFF FOR MEMBERS ONLY      Register Now