fbpx

This was a pretty busy week, we may have had a Google search algorithm update this week and maybe, just maybe, Forbes got hit hard by it. Google is probably going to revert the favicon and black ad label user interface, lots of tests are going on now. Bing hides the ad label as well, it isn’t just Google. I posted a summary of everything you need to know about the Google feature snippet deduplication change, including Google might be giving us performance data on them, images in featured snippets may change, Google will move the right side featured snippet to the top and until then it stopped deduplicating the right side feature snippets. Google Search Console launched a new removals tool with a few set of features. Google may have issues indexing international pages. Google says they treat links in PDFs as nofollowed links but that contradicts earlier statements. Google said schema markup will continue to get more complicated. Google said do not translate your image URLs. I shared a fun people also ask that looks like an ad, but is not an ad. Google Assistant Actions do not give you a ranking boost. Google is still using Chrome 41 as the user agent when requesting resources but not for rendering. Google Ads switched all campaign types to standard delivery. Google My Business suspensions are at an all time high. Google Chrome is testing hiding URLs for the search results page. Google is hiring an SEO. I posted two vlogs this week, one with Thom Craver and one with Lisa Barone. Oh and if you want to help sponsor those vlogs, go to patreon.com/barryschwartz. That was the search news this week at the Search Engine Roundtable.

Make sure to subscribe to our video feed or subscribe directly on iTunes to be notified of these updates and download the video in the background. Here is the YouTube version of the feed:

Search Topics of Discussion:

 [Source: This article was published in seroundtable.com By Barry Schwartz - Uploaded by the Association Member: Olivia Russell]

Categorized in Search Engine

Google is enhancing its Collections in Search feature, making it easy to revisit groups of similar pages.

Similar to the activity cards in search results, introduced last year, Google’s Collections feature allows users to manually create groups of like pages.

Now, using AI, Google will automatically group together similar pages in a collection. This feature is compatible with content related to activities like cooking, shopping, and hobbies.

collection.jpeg

This upgrade to collections will be useful in the event you want to go back and look at pages that weren’t manually saved. Mona Vajolahi, a Google Search Product Manager, states in an announcement:

“Remember that chicken parmesan recipe you found online last week? Or that rain jacket you discovered when you were researching camping gear? Sometimes when you find something on Search, you’re not quite ready to take the next step, like cooking a meal or making a purchase. And if you’re like me, you might not save every page you want to revisit later.”

These automatically generated collections can be saved to keep forever, or disregarded if not useful. They can be accessed any time from the Collections tab in the Google app, or through the Google.com side menu in a mobile browser.

Once a collection is saved, Google can help users discover even more similar pages by tapping on the “Find More” button. Google is also adding a collaboration feature that allow users to share and work on creating collections with other people.

Auto-generated collections will start to appear for US English users this week. The ability to see related content will launch in the coming weeks.

[Source: This article was published in searchenginejournal.com By Matt Southern - Uploaded by the Association Member: Logan Hochstetler]

Categorized in Search Engine

It’s not paid inclusion, but it is paid occlusion

Happy Friday to you! I have been reflecting a bit on the controversy du jour: Google’s redesigned search results. Google is trying to foreground sourcing and URLs, but in the process it made its results look more like ads, or vice versa. Bottom line: Google’s ads just look like search results now.

I’m thinking about it because I have to admit that I don’t personally hate the new favicon -plus-URL structure. But I think that might be because I am not a normal consumer of web content. I’ve been on the web since the late ‘90s and I parse information out of URLs kind of without thinking about it. (In fact, the relative decline of valuable information getting encoded into the URL is a thing that makes me sad.)

I admit that I am not a normal user. I set up custom Chrome searches and export them to my other browsers. I know what SERP means and the term kind of slips out in regular conversation sometimes. I have opinions about AMP and its URL and caching structure. I’m a weirdo.

As that weirdo, Google’s design makes perfect sense and it’s possible it might do the same for regular folk. The new layout for search result is ugly at first glance — but then Google was always ugly until relatively recently. I very quickly learned to unconsciously take in the information from the top favicon and URL-esque info without it really distracting me.

...Which is basically the problem. Google’s using that same design language to identify its ads instead of much more obvious, visually distinct methods. It’s consistent, I guess, but it also feels deceptive.

Recode’s Peter Kafka recently interviewed Buzzfeed CEO Jonah Peretti, and Peretti said something really insightful: what if Google’s ads really aren’t that good? What if Google is just taking credit for clicks on ads just because people would have been searching for that stuff anyway? I’ve been thinking about it all day: what if Google ads actually aren’t that effective and the only reason they make so much is billions of people use Google?

The pressure to make them more effective would be fairly strong, then, wouldn’t it? And it would get increasingly hard to resist that pressure over time.

I am old enough to remember using the search engines before Google. I didn’t know how bad their search technology was compared to what was to come, but I did have to bounce between several of them to find what I wanted. Knowing what was a good search for WebCrawler and what was good for Yahoo was one of my Power User Of The Internet skills.

So when Google hit, I didn’t realize how powerful and good the PageRank technology was right away. What I noticed right away is that I could trust the search results to be “organic” instead of paid and that there were no dark patterns tricking me into clicking on an ad.

One of the reasons Google won search in the first place with old people like me was that in addition to its superior technology, it drew a harder line against allowing paid advertisements into its search results than its competitors.

With other search engines, there was the problem of “paid inclusion,” which is the rare business practice that does exactly what the phrase means. You never really knew if what you were seeing was the result of a web-crawling bot or a business deal.

This new ad layout doesn’t cross that line, but it’s definitely problematic and it definitely reduces my trust in Google’s results. It’s not so much paid inclusion as paid occlusion.

Today, I still trust Google to not allow business dealings to affect the rankings of its organic results, but how much does that matter if most people can’t visually tell the difference at first glance? And how much does that matter when certain sections of Google, like hotels and flights, do use paid inclusion? And how much does that matter when business dealings very likely do affect the outcome of what you get when you use the next generation of search, the Google Assistant?

And most of all: if Google is willing to visually muddle ads, how long until its users lose trust in the algorithm itself? With this change, Google is becoming what it once sought to overcome: AltaVista.

Read More...

[Source: This article was published in theverge.com By Barry Schwartz - Uploaded by the Association Member: James Gill]

Categorized in Search Engine

Now that the Google January 2020 core update is mostly rolled out, we have asked several data providers to send us what they found with this Google search update. All of the data providers agree that this core update was a big one and impacted a large number of web sites.

The facts. What we know from Google, as we previously reported, is that the January 2020 core update started to roll out around 12:00 PM ET on Monday, January 13th. That rollout was “mostly done” by Thursday morning, on January 16th. We also know that this was a global update, and was not specific to any region, language or category of web sites. It is a classic “broad core update.”

What the tools are seeing. We have gone to third-party data companies asking them what their data shows about this update.

RankRanger. Mordy Oberstein from RankRanger said, “the YMYL (your money, your life) niches got hit very hard.” “This a huge update,” he added. “There is massive movement at the top of the SERP for the Health and Finance niches and incredible increases for all niches when looking at the top 10 results overall.”

Here is a chart showing the rank volatility broken down by industry and the position of those rankings:

 all-niche-data-jan2020-core-update-800x550.png

“Excluding the Retail niche, which according to what I am seeing was perhaps a focus of the December 6th update, the January 2020 core update was a far larger update across the board and at every ranking position,” Mordy Oberstein added. “However, when looking at the top 10 results overall during the core update, the Retail niche started to separate itself from the levels of volatility seen in December as well.”

SEMRush. Yulia Ibragimova from SEMRush said “We can see that the latest Google Update was quite big and was noticed almost in every category.” The most volatile categories according to SEMRush, outside of Sports and News, were Online communities, Games, Arts & Entertainments, and Finance. But Yulia Ibragimova added that all categories saw major changes and “we can assume that this update wasn’t aimed to any particular topics,” she told us.

SEMRush offers a lot of data available on its web site over here. But they sent us this additional data around this update for us.

Here is the volatility by category by mobile vs desktop search results:

semrush-catts-642x600.png

The top ten winners according to SEMRush were Dictionary.com, Hadith of the Day, Discogs, ABSFairings, X-Rates, TechCrunch, ShutterStock, 247Patience, GettyImages and LiveScores.com. The top ten losers were mp3-youtube.download, TotalJerkFace.com, GenVideos.io, Tuffy, TripSavvy, Honolulu.gov, NaughtyFind, Local.com, RuthChris and Local-First.org.

Sistrix. Johannes Beus from Sistrix posted their analysis of this core update. He said “Domains that relate to YMYL (Your Money, Your Life) topics have been re-evaluated by the search algorithm and gain or lose visibility as a whole. Domains that have previously been affected by such updates are more likely to be affected again. The absolute fluctuations appear to be decreasing with each update – Google is now becoming more certain of its assessment and does not deviate as much from the previous assessment.”

Here is the Sistrix chart showing the change:

 uk.sistrix.com_onhealth.com_seo_visibility-1-800x361.png

According to Sistrix, the big winners were goal.com, onhealth.com, CarGurus, verywellhealth.com, Fandango, Times Of Israel, Royal.uk, and WestField. The big losers were CarMagazine.co.uk, Box Office Mojo, SkySports, ArnoldClark.com, CarBuyer.co.uk, History Extra, Evan Shalshaw, and NHS Inform.

SearchMetrics. Marcus Tober, the founder of SearchMetrics, told us “the January Core Update seems to revert some changes for the better or worse depending on who you are. It’s another core update where thin content got penalized and where Google put an emphasis in YMYL. The update doesn’t seem to affect as many pages as with the March or September update in 2019. But has similar characteristics.”

Here are some specific examples SearchMetrics shared. First was that Onhealth.com has won at March 2019 Core update and lost at September 2019 and won again big time at January 2020 Core update

 onhealth-800x320.png

While Verywellhealth.com was loser during multiple core updates:

 verywell-800x316.png

Draxe.com, which has been up and down during core updates, with this update seems to be a big winner with +83%. but in previous core updates, it got hit hard:

 draxe-800x318.png

The big winners according to SearchMetrics were esty.com, cargurus.com, verywellhealth.com, overstock.com, addictinggames.com, onhealth.com, bigfishgames,com and health.com. The big losers were tmz.com, academy.com, kbhgames.com, orbitz.com, silvergames.com, autolist.com, etonline.com, trovit.com and pampers.com.

What to do if you are hit. Google has given advice on what to consider if you are negatively impacted by a core update in the past. There aren’t specific actions to take to recover, and in fact, a negative rankings impact may not signal anything is wrong with your pages. However, Google has offered a list of questions to consider if you’re site is hit by a core update.

Why we care. It is often hard to isolate what you need to do to reverse any algorithmic hit your site may have seen. When it comes to Google core updates, it is even harder to do so. If this data and previous experience and advice has shown us is that these core updates are broad, wide and cover a lot of overall quality issues. The data above has reinforced this to be true. So if your site was hit by a core update, it is often recommended to step back from it all, take a wider view of your overall web site and see what you can do to improve the site overall.

[Source: This article was published in searchengineland.com By Barry Schwartz - Uploaded by the Association Member: Edna Thomas]

Categorized in Search Engine

If you’re looking for data, your search should start here. Google’s Dataset Search just launched as a full-fledged search tool, and it’s about as good as you’d expect. Google does a masterful job of collating all kinds of datasets from all across the internet with useful info like publication data, authors and file types available before you even click through. From NFL stats from the ’70s to catch records of great white sharks in the northwest Pacific, it seems to have it all. There are about 25 million datasets available now — actually just “a fraction of datasets on the web,” Google told the Verge — but more will be available as data hosts update their metadata.

Is there a word for that? Last week, as I took what must have been my hundredth Uber or Lyft ride at the tail end of two weeks of travel, I publicly wondered if there was a word for the specific type of small talk you make with a rideshare driver. (There isn’t, but I tip my hat to my former editor, Anne Glover, for whipping “chauffeurenfreude” together.) Different languages often feature unique words that capture seemingly indescribable feelings or experiences that don’t translate well at all. Here’s a website that keeps track of them.

This messaging app will self-destruct in 10 seconds. Literally. Well, not literally. There’s no explosion. But with Yap, messages (between up to six people) exist only until you type your next message, taking “ephemeral” to a whole new level. It seems to me that this is more of a proof of concept that shows the internet doesn’t have to be forever (imagine that!) and less of an actual useful tool for journalists. But the folks who subscribe to this newsletter are smart cookies. Prove me wrong.

Facebook just gave you access to some more of what it knows about you. Because of multi-site logins and Facebook ads, Facebook receives all kinds of information about users’ activities on other apps and websites. With the new Off-Facebook Activity tool, you can see and control exactly where that happens. “You might be shocked or at least a little embarrassed by what you find in there,” writes Washington Post tech columnist Geoffrey A. Fowler, and he couldn’t be more right — by piecing info together from my history, you can tell that I have a chronic bad habit of ordering late-night Domino’s pizza.

SPONSORED: Looking for an expert source for a story? Find and request an interview with academics from top universities on the Coursera | Expert Network, a new, free tool built for journalists. The Expert Network highlights those who can speak to the week’s trending news stories and showcases their perspectives on topical issues in short audio and video clips. Quickly and easily access a diverse set of subject matter experts at experts.coursera.org today.

If you needed another reminder to use caution online, here it is. The Tampa Bay Times, which Poynter owns, was the latest news organization to be hit by a nasty ransomware attack. The Times reported that it is unclear how the attack was carried out, so I can’t give you specific tips for avoiding a similar fate, but it’s a good reminder that any organization is only as safe as its weakest link. There are tools that can help — a good password manager and a well-placed firewall, for starters — but exercising good internet safety hygiene is the best first step. Be skeptical of emails from unknown senders, especially those with attachments. Keep your operating systems and software updated. And don’t use weak passwords (and especially don’t use the same weak passwords across multiple websites).

Weird news is often harmful to the most vulnerable members of society. I cringe every time I see a “Florida Man” story (my colleague Al Tompkins lays out why that is here), but many stories labeled “weird” or “dumb/stupid criminals” capitalize on human misery. Some of these stories may seem funny, but at whose expense?

Here’s a tool that displays every road in a city. It’s an interesting way to look at any metropolitan area, town or hamlet — from the world’s biggest city of Chongqing, China (population: 30 million), all the way down to my humble hometown of Gasport, New York (population: 1,248). Plus, you can export each one as a .png or editable .svg file. (Just a warning: Smaller locales seem to take a long time to load, if they even do at all.)

Bookmark this publishing tool in case it’s the next best thing (it probably is). The founding CEO of Chartbeat, a ubiquitous realtime analytics tool for newsrooms, is back at it with new project. It’s called Scroll and it massively improves the reading experience by removing ads and loading pages faster. My colleague, Rick Edmonds, has more about its founder and the future of the platform.

WikiHow’s bizarre art has been plastered all over the internet since 2005. Many of its pieces feature odd scenes that would probably never happen in real life. You’ve probably seen them repurposed in meme form. Here’s the strange story about how they’re made (and yes, it features some human misery, though we’re not making fun of it here).

[This article is originally published in bleepingcomputer.com By Lawrence Abrams - Uploaded by AIRS Member: Eric Beaudoin]

Categorized in Search Engine

Michael struggles to find the search results he’s looking for, and would like some tips for better Googling

 Want to search like a pro? These tips will help you up you Googling game using the advanced tools to narrow down your results. Photograph: Alastair Pike/AFP via Getty Images
Last week’s column mentioned search skills. I’m sometimes on the third page of results before I get to what I was really looking for. I’m sure a few simple tips would find these results on page 1. All advice welcome. Michael

Google achieved its amazing popularity by de-skilling search. Suddenly, people who were not very good at searching – which is almost everyone – could get good results without entering long, complex searches. Partly this was because Google knew which pages were most important, based on its PageRank algorithm, and it knew which pages were most effective, because users quickly bounced back from websites that didn’t deliver what they wanted.

Later, Google added personalisation based on factors such as your location, your previous searches, your visits to other websites, and other things it knew about you. This created a backlash from people with privacy concerns, because your searches into physical and mental health issues, legal and social problems, relationships and so on can reveal more about you than you want anyone else – or even a machine – to know.

When talking about avoiding “the creepy line”, former Google boss Eric Schmidt said: “We don’t need you to type at all. We know where you are. We know where you’ve been. We can more or less know what you’re thinking about.”

Google hasn’t got to that point, yet, but it does want to save you from typing. Today, Google does this through a combination of auto-complete search suggestions, Answer Boxes, and “People also ask” boxes, which show related questions along with their “feature snippets”. As a result, Google is much less likely to achieve its stated aim of sending you to another website. According to Jumpshot research, about half of browser-based searches no longer result in a click, and about 6% go to Google-owned properties such as YouTube and Maps.

You could get upset about Google scraping websites such as Wikipedia for information and then keeping their traffic, but this is the way the world is going. Typing queries into a browser is becoming redundant as more people use voice recognition on smartphones or ask the virtual assistant on their smart speakers. Voice queries need direct answers, not pages of links.

So, I can give you some search tips, but they may not be as useful as they were when I wrote about them in January 2004 – or perhaps not for as long.

Advanced Search for everyone
Advanced Search for everyone.jpg
 Google’s advanced search page is the tool to properly drill down into the results. Photograph: Samuel Gibbs/The Guardian

The easiest way to create advanced search queries in Google is to use the form on the Advanced Search page, though I suspect very few people do. You can type different words, phrases or numbers that you want to include or exclude into the various boxes. When you run the search, it converts your input into a single string using search shortcuts such as quotation marks (to find an exact word or phrase) and minus signs (to exclude words).

You can also use the form to narrow your search to a particular language, region, website or domain, or to a type of file, how recently it was published and so on. Of course, nobody wants to fill in forms. However, using the forms will teach you most of the commands mentioned below, and it’s a fallback if you forget any.

Happily, many commands work on other search engines too, so skills are transferable.

Use quotation marks
4759.jpg
 Quotation marks can be a powerful tool to specify exact search terms. Photograph: IKEA

If you are looking for something specific, quotation marks are invaluable. Putting quotation marks around single words tells the search engine that you definitely want them to appear on every page it finds, rather than using close matches or synonyms. Google will, of course, ignore this, but at least the results page will tell you which word it has ignored. You can click on that word to insist, but you will get fewer or perhaps no results.

Putting a whole phrase in inverted commas has the same effect, and is useful for finding quotations, people’s names, book and film titles, or particular phrases.

You can also use an asterisk as a wildcard to find matching phrases. For example, The Simpsons episode, Deep Space Homer, popularised the phrase: “I for one welcome our new insect overlords”. Searching for “I for one welcome our new * overlords” finds other overlords such as aliens, cephalopods, computers, robots and squirrels.

Nowadays, Google’s RankBrain is pretty good at recognising titles and common phrases without quote marks, even if they include “stop words” such as a, at, that, the and this. You don’t need quotation marks to search for the Force, The Who or The Smiths.

However, it also uses synonyms rather than strictly following your keywords. It can be quicker to use minus signs to exclude words you don’t want than add terms that are already implied. One example is jaguar -car.

Use site commands

2618.jpg
 Using the ‘site:’ command can be a powerful tool for quickly searching a particular website. Photograph: Samuel Gibbs/The Guardian

Google also has a site: command that lets you limit your search to a particular website or, with a minus sign (-site:), exclude it. This command uses the site’s uniform resource locator or URL.

For example, if you wanted to find something on the Guardian’s website, you would type site:theguardian.com (no space after the colon) alongside your search words.

You may not need to search the whole site. For example, site:theguardian.com/technology/askjack will search the Ask Jack posts that are online, though it doesn’t search all the ancient texts (continued on p94).

There are several similar commands. For example, inurl: will search for or exclude words that appear in URLs. This is handy because many sites now pack their URLs with keywords as part of their SEO (search-engine optimisation). You can also search for intitle: to find words in titles.

Web pages can include incidental references to all sorts of things, including plugs for unrelated stories. All of these will duly turn up in text searches. But if your search word is part of the URL or the title, it should be one of the page’s main topics.

You can also use site: and inurl: commands to limit searches to include, or exclude, whole groups of websites. For example, either site:co.uk or inurl:co.uk will search matching UK websites, though many UK sites now have .com addresses. Similarly, site:ac.uk and inurl:ac.uk will find pages from British educational institutions, while inurl:edu and site:edu will find American ones. Using inurl:ac.uk OR inurl:edu (the Boolean command must be in caps) will find pages from both. Using site:gov.uk will find British government websites, and inurl:https will search secure websites. There are lots of options for inventive searchers.

Google Search can also find different types of file, using either filetype: or ext: (for file extension). These include office documents (docx, pptx, xlxs, rtf, odt, odp, odx etc) and pdf files. Results depend heavily on the topic. For example, a search for picasso filetype:pdf is more productive than one for stormzy.

Make it a date

1700.jpg
 Narrowing your search by date can find older pieces. Photograph: Samuel Gibbs/The Guardian

We often want up-to-date results, particularly in technology where things that used to be true are not true any more. After you have run a search, you can use Google’s time settings to filter the results, or use new search terms. To do this, click Tools, click the down arrow next to “Any time”, and use the dropdown menu to pick a time period between “Past hour” and “Past year”.

Last week, I was complaining that Google’s “freshness algorithm” could serve up lots of blog-spam, burying far more useful hits. Depending on the topic, you can use a custom time range to get less fresh but perhaps more useful results.

Custom time settings are even more useful for finding contemporary coverage of events, which might be a company’s public launch, a sporting event, or something else. Human memories are good at rewriting history, but contemporaneous reports can provide a more accurate picture.

However, custom date ranges have disappeared from mobile, the daterange: command no longer seems to work in search boxes, and “sort by date” has gone except in news searches. Instead, this year, Google introduced before: and after: commands to do the same job. For example, you could search for “Apple iPod” before:2002-05-31 after:2001-10-15 for a bit of nostalgia. The date formats are very forgiving, so one day we may all prefer it.

 [Source: This article was published in theguardian.com - Uploaded by the Association Member: Carol R. Venuti] 

Categorized in Search Engine

Earlier today, Google  announced that it would be redesigning the redesign of its search results as a response to withering criticism from politicians, consumers and the press over the way in which search results displays were made to look like ads.

Google makes money when users of its search service click on ads. It doesn’t make money when people click on an unpaid search result. Making ads look like search results makes Google more money.

It’s also a pretty evil (or at least unethical) business decision by a company whose mantra was “Don’t be evil”(although they gave that up in 2018).

 

Users began noticing the changes to search results last week, and at least one user flagged the changes earlier this week.

There's something strange about the recent design change to google search results, favicons and extra header text: they all look like ads, which is perhaps the point?

Screenshot 1
EO0MQcEU0AAGtVR
 
Google responded with a bit of doublespeak from its corporate account about how the redesign was intended to achieve the opposite effect of what it was actually doing.

“Last year, our search results on mobile gained a new look. That’s now rolling out to desktop results this week, presenting site domain names and brand icons prominently, along with a bolded ‘Ad’ label for ads,” the company wrote.

Senator Mark Warner (D-VA) took a break from impeachment hearings to talk to The Washington Post about just how bad the new search redesign was.

“We’ve seen multiple instances over the last few years where Google has made paid advertisements ever more indistinguishable from organic search results,” Warner told the Post. “This is yet another example of a platform exploiting its bottleneck power for commercial gain, to the detriment of both consumers and also small businesses.”

Google’s changes to its search results happened despite the fact that the company is already being investigated by every state in the country for antitrust violations.

For Google, the rationale is simple. The company’s advertising revenues aren’t growing the way they used to, and the company is looking at a slowdown in its core business. To try and juice the numbers, dark patterns present an attractive way forward.

Indeed, Google’s using the same tricks that it once battled to become the premier search service in the U.S. When the company first launched its search service, ads were clearly demarcated and separated from actual search results returned by Google’s algorithm. Over time, the separation between what was an ad and what wasn’t became increasingly blurred.

 
Screenshot 2

Color fade: A history of Google ad labeling in search results http://selnd.com/2adRCdU 

CoOOsx WAAAgFhq
 
“Search results were near-instant and they were just a page of links and summaries – perfection with nothing to add or take away,” user experience expert Harry Brignull (and founder of the watchdog website darkpatterns.org) said of the original Google search results in an interview with TechCrunch.

“The back-propagation algorithm they introduced had never been used to index the web before, and it instantly left the competition in the dust. It was proof that engineers could disrupt the rules of the web without needing any suit-wearing executives. Strip out all the crap. Do one thing and do it well.”

“As Google’s ambitions changed, the tinted box started to fade. It’s completely gone now,” Brignull added.

The company acknowledged that its latest experiment might have gone too far in its latest statement and noted that it will “experiment further” on how it displays results.

 [Source: This article was published in techcrunch.com By Jonathan Shieber - Uploaded by the Association Member: Joshua Simon]

Categorized in Search Engine

"In the future, everyone will be anonymous for 15 minutes." So said the artist Banksy, but following the rush to put everything online, from relationship status to holiday destinations, is it really possible to be anonymous - even briefly - in the internet age?

That saying, a twist on Andy Warhol's famous "15 minutes of fame" line, has been interpreted to mean many things by fans and critics alike. But it highlights the real difficulty of keeping anything private in the 21st Century.

"Today, we have more digital devices than ever before and they have more sensors that capture more data about us," says Prof Viktor Mayer-Schoenberger of the Oxford Internet Institute.

And it matters. According to a survey from the recruitment firm Careerbuilder, in the US last year 70% of companies used social media to screen job candidates, and 48% checked the social media activity of current staff.

Also, financial institutions can check social media profiles when deciding whether to hand out loans.

_108600940_banksybarelylegal2006.jpg

Meanwhile, companies create models of buying habits, political views and even use artificial intelligence to gauge future habits based on social media profiles.

One way to try to take control is to delete social media accounts, which some did after the Cambridge Analytica scandal, when 87 million people had their Facebook data secretly harvested for political advertising purposes.

While deleting social media accounts may be the most obvious way to remove personal data, this will not have any impact on data held by other companies.

Fortunately, in some countries the law offers protection.

In the European Union the General Data Protection Regulation (GDPR) includes the "right to be forgotten" - an individual's right to have their personal data removed.

In the UK the that is policed by the Information Commissioner's Office. Last year it received 541 requests to have information removed from search engines, according to data shown to the BBC, up from 425 the year before, and 303 in 2016-17.

The actual figures may be higher as ICO says it often only becomes involved after an initial complaint made to the company that holds the information has been rejected.

But ICO's Suzanne Gordon says it is not clear-cut: "The GDPR has strengthened the rights of people to ask for an organisation to delete their personal data if they believe it is no longer necessary for it to be processed.

"However, this right is not absolute and in some cases must be balanced against other competing rights and interests, for example, freedom of expression."

The "right to be forgotten" shot to prominence in 2014 and led to a wide-range of requests for information to be removed - early ones came from an ex-politician seeking re-election, and a paedophile - but not all have to be accepted.

Companies and individuals, that have the money, can hire experts to help them out.

A whole industry is being built around "reputation defence" with firms harnessing technology to remove information - for a price - and bury bad news from search engines, for example.

One such company, Reputation Defender, founded in 2006, says it has a million customers including wealthy individuals, professionals and chief executives. It charges around £5,000 ($5,500) for its basic package.

It uses its own software to alter the results of Google searches about its clients, helping to lower less favourable stories in the results and promote more favourable ones instead.

_108600440_googlegettyimages-828896324-1.jpg

"The technology focuses on what Google sees as important when indexing websites at the top or bottom of the search results," says Tony McChrystal, managing director.

"Generally, the two major areas Google prioritises are the credibility and authority the web asset has, and how users engage with the search results and the path Google sees each unique individual follow.

"We work to show Google that a greater volume of interest and activity is occurring on sites that we want to promote, whether they're new websites we've created, or established sites which already appear in the [Google results pages], while sites we are seeking to suppress show an overall lower percentage of interest."

The firm sets out to achieve its specified objective within 12 months.

"It's remarkably effective," he adds, "since 92% of people never venture past the first page of Google and more than 99% never go beyond page two."

Prof Mayer-Schoenberger points out that, while reputation defence companies may be effective, "it is hard to understand why only the rich that can afford the help of such experts should benefit and not everyone".

_108598284_warhol.jpg

So can we ever completely get rid of every online trace?

"Simply put, no," says Rob Shavell, co-founder and chief executive of DeleteMe, a subscription service which aims to remove personal information from public online databases, data brokers, and search websites.

"You cannot be completely erased from the internet unless somehow all companies and individuals operating internet services were forced to fundamentally change how they operate.

"Putting in place strong sensible regulation and enforcement to allow consumers to have a say in how their personal information can be gathered, shared, and sold would go a long way to addressing the privacy imbalance we have now."

[Source: This article was published in bbc.com By Mark Smith - Uploaded by the Association Member: Jay Harris]

Categorized in Internet Privacy

Reverse image search is one of the most well-known and easiest digital investigative techniques, with two-click functionality of choosing “Search Google for image” in many web browsers. This method has also seen widespread use in popular culture, perhaps most notably in the MTV show Catfish, which exposes people in online relationships who use stolen photographs on their social media.

However, if you only use Google for reverse image searching, you will be disappointed more often than not. Limiting your search process to uploading a photograph in its original form to just images.google.com may give you useful results for the most obviously stolen or popular images, but for most any sophisticated research project, you need additional sites at your disposal — along with a lot of creativity.

This guide will walk through detailed strategies to use reverse image search in digital investigations, with an eye towards identifying people and locations, along with determining an image’s progeny. After detailing the core differences between the search engines, Yandex, Bing, and Google are tested on five test images showing different objects and from various regions of the world.

Beyond Google

The first and most important piece of advice on this topic cannot be stressed enough: Google reverse image search isn’t very good.

As of this guide’s publication date, the undisputed leader of reverse image search is the Russian site Yandex. After Yandex, the runners-up are Microsoft’s Bing and Google. A fourth service that could also be used in investigations is TinEye, but this site specializes in intellectual property violations and looks for exact duplicates of images.

Yandex

Yandex is by far the best reverse image search engine, with a scary-powerful ability to recognize faces, landscapes, and objects. This Russian site draws heavily upon user-generated content, such as tourist review sites (e.g. FourSquare and TripAdvisor) and social networks (e.g. dating sites), for remarkably accurate results with facial and landscape recognition queries.

Its strengths lie in photographs taken in a European or former-Soviet context. While photographs from North America, Africa, and other places may still return useful results on Yandex, you may find yourself frustrated by scrolling through results mostly from Russia, Ukraine, and eastern Europe rather than the country of your target images.

To use Yandex, go to images.yandex.com, then choose the camera icon on the right.

yandex instructions1

From there, you can either upload a saved image or type in the URL of one hosted online.

yandex instructions2 1536x70

If you get stuck with the Russian user interface, look out for Выберите файл (Choose file), Введите адрес картинки (Enter image address), and Найти (Search). After searching, look out for Похожие картинки (Similar images), and Ещё похожие (More similar).

The facial recognition algorithms used by Yandex are shockingly good. Not only will Yandex look for photographs that look similar to the one that has a face in it, but it will also look for other photographs of the same person (determined through matching facial similarities) with completely different lighting, background colors, and positions. While Google and Bing may just look for other photographs showing a person with similar clothes and general facial features, Yandex will search for those matches, and also other photographs of a facial match. Below, you can see how the three services searched the face of Sergey Dubinsky, a Russian suspect in the downing of MH17. Yandex found numerous photographs of Dubinsky from various sources (only two of the top results had unrelated people), with the result differing from the original image but showing the same person. Google had no luck at all, while Bing had a single result (fifth image, second row) that also showed Dubinsky.

Screenshot 4

Screenshot 5

Yandex is, obviously, a Russian service, and there are worries and suspicions of its ties (or potential future ties) to the Kremlin. While we at Bellingcat constantly use Yandex for its search capabilities, you may be a bit more paranoid than us. Use Yandex at your own risk, especially if you are also worried about using VK and other Russian services. If you aren’t particularly paranoid, try searching an un-indexed photograph of yourself or someone you know in Yandex, and see if it can find yourself or your doppelganger online.

Bing

Over the past few years, Bing has caught up to Google in its reverse image search capabilities, but is still limited. Bing’s “Visual Search”, found at images.bing.com, is very easy to use, and offers a few interesting features not found elsewhere.

bing visualsearch

Within an image search, Bing allows you to crop a photograph (button below the source image) to focus on a specific element in said photograph, as seen below. The results with the cropped image will exclude the extraneous elements, focusing on the user-defined box. However, if the selected portion of the image is small, it is worth it to manually crop the photograph yourself and increase the resolution — low-resolution images (below 200×200) bring back poor results.

Below, a Google Street View image of a man walking a couple of pugs was cropped to focus on just the pooches, leading to Bing to suggest the breed of dog visible in the photograph (the “Looks like” feature), along with visually similar results. These results mostly included pairs of dogs being walked, matching the source image, but did not always only include pugs, as French bulldogs, English bulldogs, mastiffs, and others are mixed in.

bing results cropped 1536x727

Google

By far the most popular reverse image search engine, at images.google.com, Google is fine for most rudimentary reverse image searches. Some of these relatively simple queries include identifying well-known people in photographs, finding the source of images that have been shared quite a bit online, determining the name and creator of a piece of art, and so on. However, if you want to locate images that are not close to an exact copy of the one you are researching, you may be disappointed.

For example, when searching for the face of a man who tried to attack a BBC journalist at a Trump rally, Google can find the source of the cropped image, but cannot find any additional images of him, or even someone who bears a passing resemblance to him.

trumprally

trump results google

While Google was not very strong in finding other instances of this man’s face or similar-looking people, it still found the original, un-cropped version of the photograph the screenshot was taken from, showing some utility.

Five Test Cases

For testing out different reverse image search techniques and engines, a handful of images representing different types of investigations are used, including both original photographs (not previously uploaded online) and recycled ones. Due to the fact that these photographs are included in this guide, it is likely that these test cases will not work as intended in the future, as search engines will index these photographs and integrate them into their results. Thus, screenshots of the results as they appeared when this guide was being written are included.

These test photographs include a number of different geographic regions to test the strength of search engines for source material in western Europe, eastern Europe, South America, southeast Asia, and the United States. With each of these photographs, I have also highlighted discrete objects within the image to test out the strengths and weaknesses for each search engine.

Feel free to download these photographs (every image in this guide is hyperlinked directly to a JPEG file) and run them through search engines yourself to test out your skills.

Olisov Palace In Nizhny Novgord, Russia (Original, not previously uploaded online)

test-a-1536x1134.jpg

Isolated: White SUV in Nizhny Novgorod

test-a-suv.jpg

Isolated: Trailer in Nizhny Novgorod

test-a-trailer.jpg

Cityscape In Cebu, Philippines (Original, not previously uploaded online)

test-b-1536x871.jpg

Isolated: Condominium complex, “The Padgett Place

b-toweronly.jpg

Isolated: “Waterfront Hotel

b-tower2only.jpg

Students From Bloomberg 2020 Ad (Screenshot from video)

test-c-1536x1120.jpg

Isolated: Student

c-studentonly.jpg

Av. do Café In São Paulo, Brazil (Screenshot Google Street View)

test-d-1536x691.jpg

Isolated: Toca do Açaí

d-tocadoacai.jpg

Isolated: Estacionamento (Parking)

d-estacionameno-1536x742.jpg

Amsterdam Canal (Original, not previously uploaded online)

test-e-1536x1150.jpg

Isolated: Grey Heron

test-e-bird.jpg

Isolated: Dutch Flag (also rotated 90 degrees clockwise)

test-e-flag.jpg

Results

Each of these photographs were chosen in order to demonstrate the capabilities and limitations of the three search engines. While Yandex in particular may seem like it is working digital black magic at times, it is far from infallible and can struggle with some types of searches. For some ways to possibly overcome these limitations, I’ve detailed some creative search strategies at the end of this guide.

Novgorod’s Olisov Palace

Predictably, Yandex had no trouble identifying this Russian building. Along with photographs from a similar angle to our source photograph, Yandex also found images from other perspectives, including 90 degrees counter-clockwise (see the first two images in the third row) from the vantage point of the source image.

a-results-yandex.jpg

Yandex also had no trouble identifying the white SUV in the foreground of the photograph as a Nissan Juke.

a-results-suv-yandex.jpg

Lastly, in the most challenging isolated search for this image, Yandex was unsuccessful in identifying the non-descript grey trailer in front of the building. A number of the results look like the one from the source image, but none are an actual match.

a-results-trailer-yandex.jpg

Bing had no success in identifying this structure. Nearly all of its results were from the United States and western Europe, showing houses with white/grey masonry or siding and brown roofs.

a-results-bings-1536x725.jpg

Likewise, Bing could not determine that the white SUV was a Nissan Juke, instead focusing on an array of other white SUVs and cars.

a-suvonly-bing-1536x728.jpg

Lastly, Bing failed in identifying the grey trailer, focusing more on RVs and larger, grey campers.

a-trailoronly-bing-1536x730.jpg

Google‘s results for the full photograph are comically bad, looking to the House television show and images with very little visual similarity.

a-results-google-1536x1213.jpg

Google successfully identified the white SUV as a Nissan Juke, even noting it in the text field search. As seen with Yandex, feeding the search engine an image from a similar perspective as popular reference materials — a side view of a car that resembles that of most advertisements — will best allow reverse image algorithms to work their magic.

a-suvonly-google.jpg

Lastly, Google recognized what the grey trailer was (travel trailer / camper), but its “visually similar images” were far from it.

a-trailoronly-google-1536x1226.jpg

Scorecard: Yandex 2/3; Bing 0/3; Google 1/3

Cebu

Yandex was technically able to identify the cityscape as that of Cebu in the Philippines, but perhaps only by accident. The fourth result in the first row and the fourth result in the second row are of Cebu, but only the second photograph shows any of the same buildings as in the source image. Many of the results were also from southeast Asia (especially Thailand, which is a popular destination for Russian tourists), noting similar architectural styles, but none are from the same perspective as the source.

b-results-yandex.jpg

Of the two buildings isolated from the search (the Padgett Palace and Waterfront Hotel), Yandex was able to identify the latter, but not the former. The Padgett Palace building is a relatively unremarkable high-rise building filled with condos, while the Waterfront Hotel also has a casino inside, leading to an array of tourist photographs showing its more distinct architecture.

b-tower1-yandex.jpg

b-tower2-yandex.jpg

Bing did not have any results that were even in southeast Asia when searching for the Cebu cityscape, showing a severe geographic limitation to its indexed results.

b-results-bing-1536x710.jpg

Like Yandex, Bing was unable to identify the building on the left part of the source image.

b-tower1-bing-1536x707.jpg

Bing was unable to find the Waterfront Hotel, both when using Bing’s cropping function (bringing back only low-resolution photographs) and manually cropping and increasing the resolution of the building from the source image. It is worth noting that the results from these two versions of the image, which were identical outside of the resolution, brought back dramatically different results.

b-tower2-bing-1536x498.jpg

b-tower2-bing2-1536x803.jpg

As with Yandex, Google brought back a photograph of Cebu in its results, but without a strong resemblance to the source image. While Cebu was not in the thumbnails for the initial results, following through to “Visually similar images” will fetch an image of Cebu’s skyline as the eleventh result (third image in the second row below).

b-results-google-1536x1077.jpg

As with Yandex and Bing, Google was unable to identify the high-rise condo building on the left part of the source image. Google also had no success with the Waterfront Hotel image.

b-tower1-google-1536x1366.jpg

b-tower2-google-1536x1352.jpg

Scorecard: Yandex 4/6; Bing 0/6; Google 2/6

Bloomberg 2020 Student

Yandex found the source image from this Bloomberg campaign advertisement — a Getty Images stock photo. Along with this, Yandex also found versions of the photograph with filters applied (second result, first row) and additional photographs from the same stock photo series. Also, for some reason, porn, as seen in the blurred results below.

c-results-yandex.jpg

When isolating just the face of the stock photo model, Yandex brought back a handful of other shots of the same guy (see last image in first row), plus images of the same stock photo set in the classroom (see the fourth image in the first row).

c-studentonly-results-yandex.jpg

Bing had an interesting search result: it found the exact match of the stock photograph, and then brought back “Similar images” of other men in blue shirts. The “Pages with this” tab of the result provides a handy list of duplicate versions of this same image across the web.

c-results-bing-1536x702.jpg

c-results-bing2.jpg

Focusing on just the face of the stock photo model does not bring back any useful results, or provide the source image that it was taken from.

c-studentonly-results-bing-1536x721.jpg

Google recognizes that the image used by the Bloomberg campaign is a stock photo, bringing back an exact result. Google will also provide other stock photos of people in blue shirts in class.

c-results-google.jpg

In isolating the student, Google will again return the source of the stock photo, but its visually similar images do not show the stock photo model, rather an array of other men with similar facial hair. We’ll count this as a half-win in finding the original image, but not showing any information on the specific model, as Yandex did.

c-studentonly-results-google.jpg

Scorecard: Yandex 6/8; Bing 1/8; Google 3.5/8

Brazilian Street View

Yandex could not figure out that this image was snapped in Brazil, instead focusing on urban landscapes in Russia.

d-results-yandex.jpg

For the parking sign [Estacionamento], Yandex did not even come close.

d-parking-yandex.jpg

Bing did not know that this street view image was taken in Brazil.

d-results-bing-1536x712.jpg

…nor did Bing recognize the parking sign

d-parking-bing-1536x705.jpg

…or the Toca do Açaí logo.

d-toco-bing-1536x498.jpg

Despite the fact that the image was directly taken from Google’s Street View, Google reverse image search did not recognize a photograph uploaded onto its own service.

d-results-google-1536x1188.jpg

Just as Bing and Yandex, Google could not recognize the Portuguese parking sign.

d-parking-google.jpg

Lastly, Google did not come close to identifying the Toca do Açaí logo, instead focusing on various types of wooden panels, showing how it focused on the backdrop of the image rather than the logo and words.

d-toca-google-1536x1390.jpg

Scorecard: Yandex 7/11; Bing 1/11; Google 3.5/11

Amsterdam Canal

Yandex knew exactly where this photograph was taken in Amsterdam, finding other photographs taken in central Amsterdam, and even including ones with various types of birds in the frame.

e-results-yandex.jpg

Yandex correctly identified bird in the foreground of the photograph as a grey heron (серая цапля), also bringing back an array of images of grey herons in a similar position and posture as the source image.

e-bird-yandex.jpg

However, Yandex flunked the test of identifying the Dutch flag hanging in the background of the photograph. When rotating the image 90 degrees clockwise to present the flag in its normal pattern, Yandex was able to figure out that it was a flag, but did not return any Dutch flags in its results.

e-flag-yandex.jpg

test-e-flag2.jpg

e-flag2-yandex.jpg

Bing only recognized that this image shows an urban landscape with water, with no results from Amsterdam.

e-results-bing-1536x723.jpg

Though Bing struggled with identifying an urban landscape, it correctly identified the bird as a grey heron, including a specialized “Looks like” result going to a page describing the bird.

e-bird-bing-1536x1200.jpg

However, like with Yandex, the Dutch flag was too confusing for Bing, both in its original and rotated forms.

e-flag-bing-1536x633.jpg

e-flag2-bing-1536x491.jpg

Google noted that there was a reflection in the canal of the image, but went no further than this, focusing on various paved paths in cities and nothing from Amsterdam.

e-results-google-1536x1365.jpg

Google was close in the bird identification exercise, but just barely missed it — it is a grey, not great blue, heron.

e-bird-google-1536x1378.jpg

Google was also unable to identify the Dutch flag. Though Yandex seemed to recognize that the image is a flag, Google’s algorithm focused on the windowsill framing the image and misidentified the flag as curtains.

e-flag-google-1536x1374.jpg

e-flag2-google-1536x1356.jpg

Final Scorecard: Yandex 9/14; Bing 2/14; Google 3.5/14

Creative Searching

Even with the shortcomings described in this guide, there are a handful of methods to maximize your search process and game the search algorithms.

Specialized Sites

For one, you could use some other, more specialized search engines outside of the three detailed in this guide. The Cornell Lab’s Merlin Bird ID app, for example, is extremely accurate in identifying the type of birds in a photograph, or giving possible options. Additionally, though it isn’t an app and doesn’t let you reverse search a photograph, FlagID.org will let you manually enter information about a flag to figure out where it comes from. For example, with the Dutch flag that even Yandex struggled with, FlagID has no problem. After choosing a horizontal tricolor flag, we put in the colors visible in the image, then receive a series of options that include the Netherlands (along with other, similar-looking flags, such as the flag of Luxembourg).

flagsearch1.jpgflagsearch2.jpg

Language Recognition

If you are looking at a foreign language with an orthography you don’t recognize, try using some OCR or Google Translate to make your life easier. You can use Google Translate’s handwriting tool to detect the language* of a letter that you hand-write, or choose a language (if you know it already) and then write it out yourself for the word. Below, the name of a cafe (“Hedgehog in the Fog“) is written out with Google Translate’s handwriting tool, giving the typed-out version of the word (Ёжик) that can be searched.

*Be warned that Google Translate is not very good at recognizing letters if you do not already know the language, though if you scroll through enough results, you can find your handwritten letter eventually.

yozhikvtumane.jpg

yozhik-1536x726.jpg

yozhik2-1536x628.jpg

Pixelation And Blurring

As detailed in a brief Twitter thread, you can pixelate or blur elements of a photograph in order to trick the search engine to focus squarely on the background. In this photograph of Rudy Giuliani’s spokeswoman, uploading the exact image will not bring back results showing where it was taken.

2019-12-16_14-55-50-1536x1036.jpg

However, if we blur out/pixelate the woman in the middle of the image, it will allow Yandex (and other search engines) to work their magic in matching up all of the other elements of the image: the chairs, paintings, chandelier, rug and wall patterns, and so on.

blurtest.jpg

After this pixelation is carried out, Yandex knows exactly where the image was taken: a popular hotel in Vienna.

yandexresult.jpg

2019-12-16_15-02-32.jpg

Conclusion

Reverse image search engines have progressed dramatically over the past decade, with no end in sight. Along with the ever-growing amount of indexed material, a number of search giants have enticed their users to sign up for image hosting services, such as Google Photos, giving these search algorithms an endless amount of material for machine learning. On top of this, facial recognition AI is entering the consumer space with products like FindClone and may already be used in some search algorithms, namely with Yandex. There are no publicly available facial recognition programs that use any Western social network, such as Facebook or Instagram, but perhaps it is only a matter of time until something like this emerges, dealing a major blow to online privacy while also (at that great cost) increasing digital research functionality.

If you skipped most of the article and are just looking for the bottom line, here are some easy-to-digest tips for reverse image searching:

  • Use Yandex first, second, and third, and then try Bing and Google if you still can’t find your desired result.
  • If you are working with source imagery that is not from a Western or former Soviet country, then you may not have much luck. These search engines are hyper-focused on these areas, and struggle for photographs taken in South America, Central America/Caribbean, Africa, and much of Asia.
  • Increase the resolution of your source image, even if it just means doubling or tripling the resolution until it’s a pixelated mess. None of these search engines can do much with an image that is under 200×200.
  • Try cropping out elements of the image, or pixelating them if it trips up your results. Most of these search engines will focus on people and their faces like a heat-seeking missile, so pixelate them to focus on the background elements.
  • If all else fails, get really creative: mirror your image horizontally, add some color filters, or use the clone tool on your image editor to fill in elements on your image that are disrupting searches.

[Source: This article was published in bellingcat.com By Aric Toler - Uploaded by the Association Member: Issac Avila] 

Categorized in Investigative Research

The internet is an iceberg. And, as you might guess, most of us only reckon with the tip. While the pages and media found via simple searches may seem unendingly huge at times, what is submerged and largely unseen – often referred to as the invisible web or deep web – is in fact far, far bigger.

THE SURFACE WEB

What we access every day through popular search engines like Google, Yahoo or Bing is referred to as the Surface Web. These familiar search engines crawl through tens of trillions of pages of available content (Google alone is said to have indexed more than 30 trillion web pages) and bring that content to us on demand. As big as this trove of information is, however, this represents only the tip of the iceberg.

Eric Schmidt, the CEO of Google, was asked to estimate the size of the World Wide Web. He estimated that of roughly 5 million terabytes of data, Google has indexed roughly 200 terabytes, or only .004% of the total internet.

THE INVISIBLE WEB

Beneath the Surface Web is what is referred to as the Deep or Invisible Web. It is comprised of:

  • Private websites, such as VPN (Virtual Private networks) and sites that require passwords and logins
  • Limited access content sites (which limit access in a technical way, such as using Captcha, Robots Exclusion Standard or no-cache HTTP headers that prevent search engines from browsing or caching them)
  • Unlinked content, without hyperlinks to other pages, which prevents web crawlers from accessing information
  • Textual content, often encoded in image or video files or in specific file formats not handled by search engines
  • Dynamic content created for a single purpose and not part of a larger collection of items
  • Scripted content, pages only accessible using Java Script, as well as content downloaded using Flash and Ajax solutions

There are many high-value collections to be found within the invisible web. Some of the material found there that most people would recognize and, potentially, find useful include:

  • Academic studies and papers
  • Blog platforms
  • Pages created but not yet published
  • Scientific research
  • Academic and corporate databases
  • Government publications
  • Electronic books
  • Bulletin boards
  • Mailing lists
  • Online card catalogs
  • Directories
  • Many subscription journals
  • Archived videos
  • Images

But knowing all these materials are out there, buried deep within the web doesn't really help the average user. What tools can we turn to in order to make sense of the invisible web? There really is no easy answer. Sure, the means to search and sort through massive amounts of invisible web information are out there, but many of these tools have an intense learning curve. This can mean sophisticated software that requires no small amount of computer savvy; it can mean energy-sucking search tools that require souped up computers to handle the task of combing through millions of pages of data; or, it can require the searching party to be unusually persistent – something most of us, with our expectations of instantaneous Google search success, won't be accustomed to.

All that being said, we can become acquainted with the invisible web by degrees. The many tools considered below will help you access a sizable slice of the invisible web's offerings. You will find we've identified a number of subject-specific databases and engines; tools with an established filter, making their searches much more narrow.

OPEN ACCESS JOURNAL DATABASES

Open access journal databases (OAJD) are compilations of free scholarly journals maintained in a manner that facilitates access by researchers and others who are seeking specific information or knowledge. Because these databases are comprised of unlinked content, they are located in the invisible web.

The vast majority of these journals are of the highest quality, with peer reviews and extensive vetting of the content before publication. However, there has been a trend of journals that are accepting scholarship without adequate quality controls, and with arrangements designed to make money for the publishers rather than furtherance of scholarship. It is important to be careful and review the standards of the database and journals chosen. "This helpful guide" explains what to look for.

Below is a sample list of well-regarded and reputable databases.

  • "AGRIS" (International Information System for Agricultural Science and Technology) is a global, public domain database maintained in multiple languages by the Food and Agriculture Organization of the United Nations. They provide free access to agricultural research and information.
  • "BioMed Central" is the UK-based publisher of 258 peer-reviewed open access journals. Their published works span science, technology and medicine and include many well-regarded titles.
  • "Copernicus Publications" has been an open-access scientific publisher in Germany since 2001. They are strong supporters of the researchers who create these articles, providing top-level peer review and promotion for their work.
  • "DeGruyter Open" (formerly Versita Open) is one of Germany's leading publishers of open access content. Today DeGruyter Open (DGO) publishes about 400 owned and third-party scholarly journals and books across all major disciplines.
  • "Directory of Open Access Journals is focused on providing access only to those journals that employ the highest quality standards to guarantee content. They are presently a repository of 9,740 journals with more than 1.5 million articles from 133 countries.
  • "EDP Sciences" (Édition Diffusion Presse Sciences) is a France-based scientific publisher with an international mission. They publish more than 50 scientific journals, with some 60,000 published pages annually.
  • "Elsevier of Amsterdam is a world leader in advancing knowledge in the science, technology and health fields. They publish nearly 2,200 journals, including The Lancet and Cell, and over 25,000 book titles, including Gray's Anatomy and Nelson' s Pediatrics.
  • "Hindawi Publishing Corporation", based in Egypt, publishes 434 peer-reviewed, open access journals covering all areas of Science, Technology and Medicine, as well as a variety of Social Sciences.
  • "Journal Seek" (Genamics) touts itself as "the largest completely categorized database of freely available journal information available on the internet," with more than 100,000 titles currently. Categories range from Arts and Literature, through both hard- and soft-sciences, to Sports and Recreation.
  • "The Multidisciplinary Digital Publishing Institute" (MDPI), based in Switzerland, is a publisher of more than 110 peer-reviewed, open access journals covering arts, sciences, technology and medicine.
  • "Open Access Journals Search Engine" (OAJSE), based in India, is a search engine for open access journals from throughout the world, except for India. An extremely simple interface. Note: the site was last updated June 21, 2013.
  • "Open J-Gate" is an India-based e-journal database of millions of journal articles in open access domain. With a worldwide reach, Open J-Gate is updated every day with new academic, research and industry articles.
  • "Open Science Directory" contains about 13,000 scientific journals, with another 7,000 special programs titles.
  • "Springer Open" offers a roster of more than 160 peer-reviewed, open access journals, as well as their more recent addition of free access books, covering all scientific disciplines.
  • "Wiley Open Access", a subsidiary of New Jersey-based global publishers John Wiley & Sons, Inc., publishes peer reviewed open access journals specific to biological, chemical and health sciences.

INVISIBLE WEB SEARCH ENGINES

Your typical search engine's primary job is to locate the surface sites and downloads that make up much of the web as we know it. These searches are able to find an array of HTML documents, video and audio files and, essentially, any content that is heavily linked to or shared online. And often, these engines, Google chief among them, will find and organize this diversity of content every time you search.

The search engines that deliver results from the invisible web are distinctly different. Narrower in scope, these deep web engines tend to access only a single type of data. This is due to the fact that each type of data has the potential to offer up an outrageous number of results. An inexact deep web search would quickly turn into a needle in a haystack. That's why deep web searches tend to be more thoughtful in their initial query requirements.
Below is a list of popular invisible web search engines:

  • "Clusty" is a meta search engine that not only combines data from a variety of different source documents, but also creates "clustered" responses, automatically sorting by category.
  • "CompletePlanet" searches more than 70,000 databases and specialty search engines found only in the invisible web. A search engine as well-suited to casual searchers as it is to researchers.
  • "DigitalLibrarian": A Librarian's Choice of the Best of the Web is maintained by a real librarian. With an eclectic mix of some 45 broad categories, Digital Librarian offers data from categories as diverse as Activism/Non Profits and Railroads and Waterways.
  • "InfoMine" is another librarian-developed internet resource collection, this time from The Regents of the University of California.
  • "InternetArchive" has an eclectic array of categories, starting with the ‘Wayback Machine,' which allows the searcher to locate archived documents, and including an archive of Grateful Dead audience and soundboard recordings. They offer 6 million texts, 1.5 million videos, 1.9 million audio recordings and 126K live music concerts.
  • "The Internet Public Library" (ipl and ipl2) is a non-profit, student-run website at Drexel University. Students volunteer to act as librarians and respond to questions from visitors. Categories of data include those directed to Children and Teens.
  • "SurfWax" is a metasearch engine that offers "practical tools for Dynamic Search Navigation." It offers the option of grabbing results from multiple search engines at the same time, or even designing "SearchSets," which are individualized groups of sources that can be used over and over in searches.
  • "UC Santa Barbara Library" offers access to a diverse group of research databases useful to students, researchers and the casual searcher. It should be noted that many of these resources are password protected. Those that do not display a lock icon are publicly accessible.
  • "USA.gov" offers acess to a huge volume of information, including all types of forms, databases, and information sites representing most government agencies.
  • "Voice of the Shuttle" (VoS) offers access to a diverse assortment of sites, including literature, literary theory, philosophy, history and cultural studies, and includes the daily update of all things "cool."

SUBJECT -SPECIFIC DATABASES

The following lists pool together some mainstream and not so mainstream databases dedicated to particular fields and areas of interest. While only a handful of these tools are able to surface deep web materials, all of the search engines and collections we have highlighted are powerful, extensive bodies of work. Many of the resources these tools surface would likely be overlooked if the same query were made on one of the mainstream engines most users fall back on, like Bing, Yahoo and even Google.

Art & Design

  • "ArtNet" deals with pricing and sourcing work in the art market. They also keep track of the latest news and artists in the industry.
  • "The Metropolitan Museum of Art" site hosts an impressively interactive body of information on their collections, exhibitions, events and research.
  • "Musée du Louvre", the renowned museum, maintains a site filled with navigable sections covering its collections.
  • "The National Gallery of Art" premier museum of arts in our nation's capital, also maintains a site detailing the highlights, exhibitions and education efforts the institution oversees.
  • "Public Art Online" is a resource detailing sources, creators, prices, projects, legal issues, success stories, resources, education and all other aspects of the creation of public art.
  • "Smithsonian Art Inventories Catalog" is a subset of the Smithsonian Institution Research Information System (SIRIS). A browsable database of over 400,000 art inventory items held in public and private collections.
  • "Web Gallery of Art" is a searchable database of European art, containing nearly 34,000 reproductions. Additional database information includes artist biographies, period music and commentaries.

Business

  • "Better Business Bureau" (BBB) Information System Search allows consumers to locate the details of ratings, consumer experience, governmental action and more of both BBB accredited and non-accredited businesses.
  • "BPubs.com" is the business publications search engine. They offer more than 200 free subscriptions to business and trade publications.
  • "BusinessUSA" is an excellent and complete database of everything a new or experienced business owner or employer should know.
  • "EDGAR: U.S. Securities and Exchange Commission" contains a database of Securities and Exchange Commission. Posts copies of corporate filings from US businesses, press releases and public statements.
  • "Global Edge" delivers a comprehensive research tool for academics, students and businesspeople to seek out answers to international business questions.
  • "Hoover's", a subsidiary of Dun & Bradstreet, is one of the best known databases of American and International business. A complete source of company and industry information, especially useful for investors.
  • "The National Bureau of Economic Research is perhaps the leading private, non-partisan research organization dedicated to unbiased analysis of economic policy. This database maintains archives of research data, meetings, activities, working papers and publications.
  • "U.S. Department of Commerce", Bureau of Economic Analysis is the source of many of the economic statistics we hear in the news, including national income and product accounts (NIPAs), gross domestic product, consumer spending, balance of payments and much more.

Legal & Social Services

Science & Technology

  • "Environmental Protection Agency" rganizes the agency's laws and regulations, science and technology, and the many issues affecting the agency and its policies.
  • "National Science Digital Library" (NSDL) is a source for science, technology, engineering and mathematics educational data. It is funded by the National Science Foundation.
  • "Networked Computer Science Technical Reports Library (NCSTRL) was developed as a collaborative effort between NASA Langley, Virginia Tech, Old Dominion University and University of Virginia. It serves as an archive for submitted scientific abstracts and other research products.
  • "Science.gov" is a compendium of more than 60 US government scientific databases and more than 200 websites. Governed by the interagency Science.gov Alliance, this site provides access to a range of government scientific research data.
  • "Science Research" is a free, publicly available deep web search engine that purports to use a sophisticated technology that permits queries to more than 300 science and technology sites simultaneously, with the results collated, ranked and stripped of duplications.
  • "WebCASPAR" provides access to science and engineering data from a variety of US educational institutions. It incorporates a table builder, allowing a combined result from various National Science Foundation and National Center for Education Statistics data sources.
  • "WebCASPAR" World Wide Science is a global scientific gateway, comprised of US and international scientific databases. Because it is multilingual, it allows real-time search and translation of reporting from an extensive group of databases.

Healthcare

  • "Cases Database" is a searchable database of more than 32,000 peer-reviewed medical case reports from 270 journals covering a variety of medical conditions.
  • "Center for Disease Control" (CDC) WONDER's online databases permit access to the substantial public health data resources held by the CDC.
  • "HCUPnet" is an online query system for those seeking access to statistical data from the Agency for Healthcare Research and Quality.
  • "Healthy People" provides rolling 10-year national objectives and programs for improving the health of Americans. They currently operate under the Healthy People 2020 decennial agenda.
  • "National Center for Biotechnology Information" (NCBI) is an offshoot of the National Institutes of Health (NIH). This site provides access to some 65 databases from the various project categories currently being researched.
  • "OMIM" offers access to the combined research of many decades into genetics and genetic disorders. With daily updates, it represents perhaps the most complete single database of this sort of data.
  • "PubMed is a database of more than 23 million citations from the US National Library of Medicine and National Institutes of Health.
  • "TOXNET" is the access portal to the US Toxicology Data Network, an offshoot of the National Library of Medicine.
  • "U.S. National Library of Medicine" is a database of medical research, available grants, available resources. The site is maintained by the National Institutes of Health.
  • "World Health Organization" (WHO) is a comprehensive site covering the many initiatives the WHO is engaged in around the world.

[Source: This article was published in onlineuniversities.com By hilip Bump - Uploaded by the Association Member: Robert Hensonw]

Categorized in How to

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media