The majority of links are considered to be commercial in nature, according to new research.Dan Petrovic, aka @DejanSEO, has just published the results of a quantitative study of 2,000 web users in the US and Australia. It was set up to discover perceptions about why web publishers link out.

Accordingly to the research, more than 40% of users think that outbound links from one web page to another are there because they generate revenue for the publisher.

‘Marketing Advertising & Revenue’ was seen to be the number one reason why a link exists, with almost a third of users expecting there to be some kind of commercial arrangement in place.

‘Promotion, Relationship & Sponsorship’ was chosen by a further 9% of respondents. Money, money, money.Meanwhile, just one in five people recognised links as organic citations to help stand up the information on a web page.

outbound links study 2016

All in all, the analysis of the results found that more than half of links exist for commercial reasons, with only 34% seen to be non-commercial.

Classifying the different types of link

I really like Dan’s classification of links, which now straddles 10 distinct areas (though there is a good amount of cross-over). They are as follows:

Attribution

Citation

Definition

Expansion

Identification

Example

Action

Relationship

Proof

Promotion

Further details on each of these link types can be found here. For example, you might file that outbound link under ‘expansion’, because it’s there for further reading and insight into this topic.Dan makes the point that since many of these link types overlap, it can be hard to spot the true intent as to why a link exists.

Some links that look natural – and which are genuinely useful – might actually be there because of some business or personal relationship. That doesn’t automatically make them sketchy. It’s just human nature.

Of course Google doesn’t necessarily see it that way. Many people fear the dreaded manual penalty and go the extra mile to neuter links, even when they have perfectly valid reasons to point visitors to their friends and siblings.

Dan says:

“I see a lot of websites nofollow links to their partner websites, sister companies and various other forms of affiliation because they were told to do so by their SEO or even someone in Google’s webspam team.

“This sort of madness has to stop. If commercially-driven links exist on the web organically then they’re organic in nature and shouldn’t be treated as ‘clean-up material’ nor should those links be penalty-yielding.”

Hear, hear..

I’d love to know the gap between how users perceive links and the actual reasons why the author / publisher put them in place. Presumably it is quite large…

Source:  https://searchenginewatch.com/2016/06/13/web-users-think-most-outbound-links-are-commercial/

Categorized in Online Research

Twenty-seven years ago, Tim Berners-Lee created the World Wide Web as a way for scientists to easily find information. It has since become the world's most powerful medium for knowledge, communications and commerce — but that doesn't mean Mr. Berners-Lee is happy with all of the consequences.

 

''It controls what people see, creates mechanisms for how people interact,'' he said of the modern day web. ''It's been great, but spying, blocking sites, repurposing people's content, taking you to the wrong websites — hat completely undermines the spirit of helping people create.''

 

So on Tuesday, Mr. Berners-Lee gathered in San Francisco with other top computer scientists — including Brewster Kahle, head of the nonprofit Internet Archive and an internet activist — to discuss a new phase for the web.

 

Today, the World Wide Web has become a system that is often subject to control by governments and corporations. Countries like China can block certain web pages from their citizens, and cloud services like Amazon Web Services hold powerful sway. So what might happen, the computer scientists posited, if they could harness newer technologies — like the software used for digital currencies, or the technology of peer-to-peer music sharing — to create a more decentralized web with more privacy, less government and corporate control, and a level of permanence and reliability?

 

''National histories, the story of a country, now happen on the web,'' said Vinton G. Cerf, another founder of the internet and chief internet evangelist at Google, in a phone interview ahead of a speech to the group scheduled for Wednesday. ''People think making things digital means they'll last forever, but that isn't true now.''

 

The project is in its early days, but the discussions — and caliber of the people involved — underscored how the World Wide Web's direction in recent years has stirred a deep anxiety among some technologists. The revelations by Edward J. Snowden that the web has been used by governments for spying and the realization that companies like Amazon, Facebook and Google have become gatekeepers to our digital lives have added to concerns.

 

On Tuesday, Mr. Berners-Lee and Mr. Kahle and others brainstormed at the event, called the Decentralized Web Summit, over new ways that web pages could be distributed broadly without the standard control of a web server computer, as well as ways of storing scientific data without having to pay storage fees to companies like Amazon, Dropbox or Google.

 

 

Efforts at creating greater amounts of privacy and accountability, by adding more encryption to various parts of the web and archiving all versions of a web page, also came up. Such efforts would make it harder to censor content.

 

''Edward Snowden showed we've inadvertently built the world's largest surveillance network with the web,'' said Mr. Kahle, whose group organized the conference. ''China can make it impossible for people there to read things, and just a few big service providers are the de facto organizers of your experience. We have the ability to change all that.''

 

Many people conflate the internet's online services and the web as one and the same — yet they are technically quite different. The internet is a networking infrastructure, where any two machines can communicate over a variety of paths, and one local network of computers can connect with other networks.

 

The web, on the other hand, is a popular means to access that network of networks. But because of the way web pages are created, managed and named, the web is not fully decentralized. Take down a certain server and a certain web page becomes unavailable. Links to pages can corrode over time. Censorship systems like China's Great Firewall eliminate access to much information for most of its people. By looking at internet addresses, it is possible for governments and companies to get a good idea of who is reading which web pages.

 

In some ways, the efforts to change the technology of creating the web are a kind of coming-of-age story. Mr. Berners-Lee created the World Wide Web while working at CERN, the European Organization for Nuclear Research, as a tool for scientists. Today, the web still runs on technologies of the older world.

 

Consider payments. In many cases, people pay for things online by entering credit card information, not much different from handing a card to a merchant for an imprint.

 

At the session on Tuesday, computer scientists talked about how new payment technologies could increase individual control over money. For example, if people adapted the so-called ledger system by which digital currencies are used, a musician might potentially be able to sell records without intermediaries like Apple's iTunes. News sites might be able to have a system of micropayments for reading a single article, instead of counting on web ads for money.

 

''Ad revenue is the only model for too many people on the web now,'' Mr. Berners-Lee said. ''People assume today's consumer has to make a deal with a marketing machine to get stuff for 'free,' even if they're horrified by what happens with their data. Imagine a world where paying for things was easy on both sides.''

 

Mr. Kahle's Internet Archive, which exists on a combination of grants and fees from digitizing books for libraries, operates the Wayback Machine, which serves as a record of discontinued websites or early versions of pages.

 

To make that work now, Mr. Kahle has to search and capture a page, then give it a brand new web address. With the right kind of distributed system, he said, ''the archive can have all of the versions, because there would be a permanent record located across many sites.''

 

The movement to change how the web is built, like a surprising number of technology discussions, has an almost religious dimension.

 

Some of the participants are extreme privacy advocates who have created methods of building sites that can't be censored, using cryptography. Mr. Cerf said he was wary of extreme anonymity, but thought the ways that digital currencies permanently record transactions could be used to make the web more accountable.

 

Still, not all the major players agree on whether the web needs decentralizing.

 

''The web is already decentralized,'' Mr. Berners-Lee said. ''The problem is the dominance of one search engine, one big social network, one Twitter for microblogging. We don't have a technology problem, we have a social problem.''

 

One that can, perhaps, be solved by more technology.

 

Source:  http://www.cnbc.com/2016/06/08/new-york-times-digital-world-wide-webs-creator-looks-to-reinvent-it.html

 

 

Categorized in Online Research

Cyberspace is not like your library

Librarians have a weird sense of humor. This is now an old joke: The internet is like a library with no catalog where all the books get up and move themselves every night...This was the state of the internet up until 1995 or thereabouts. Finding anything on the internet required comic strip characters like Archie, Veronica and Jughead, and generally you were the one who ended up feeling like a jughead when you rooted around for hours and still came up dry.

The new joke is:

The internet is like a library with a thousand catalogs, none of which contains all the books and all of which classify the books in different categories—and the books still move around every night. The problem now is not that of "finding anything" but finding a particular thing. When your search term in one of the popular search engines brings back 130,000 hits, you still wonder if the one thing you're looking for will be among them.

This can be an enormous problem when you're trying to do serious research on the internet. Too much information is almost worse than too little, because it takes so much time to sort through it to see if there's anything useful. The rest of this section will give you some pointers to help you become an effective internet researcher.

Get to know the reference sources on the internet

Finding reference material on the Web can be a lot more difficult than walking into the Reference Room in your local library.

The subject-classified Web directories described below will provide you with your main source of links to reference materials on the Web. In addition, many public and academic libraries, like the Internet Public Library, have put together lists of links to Web sites, categorized by subject. The difficulty is finding Web sites that contain the same kind of substantive content you'd find in a library. See the section on Reference Sources on the Web for a list of some Web-based reference materials, but please read Information found—and not found—on the Web to understand why it's different from using the library.

Understand how search engines work

Search engines are software tools that allow a user to ask for a list of Web pages containing certain words or phrases from an automated search index. The automated search index is a database containing some or all of the words appearing on the Web pages that have been indexed. The search engines send out a software program known as a spider, crawler or robot. The spider follows hyperlinks from page to page around the Web, gathering and bringing information back to the search engine to be indexed.

Most search engines index all the text found on a Web page, except for words too common to index, such as "a, and, in, to, the" and so on. When a user submits a query, the search engine looks for Web pages containing the words, combinations, or phrases asked for by the user. Engines may be programmed to look for an exact match or a close match (for example, the plural of the word submitted by the user). They may rank the hits as to how close the match is to the words submitted by the user.

One important thing to remember about search engines is this: once the engine and the spider have been programmed, the process is totally automated. No human being examines the information returned by the spider to see what subject it might be about or whether the words on the Web page adequately reflect the actual main point of the page.

Another important fact is that all the search engines are different. They each index differently and treat users' queries differently (how nice!). The burden is on the searcher to learn how to use the features of each search engine. See the links to Search Engines and to sources which have done Evaluations of the various features of Web directories and search engines.

See the Web and internet tutorials in the Links section for online articles about search engines.

Know the difference between a search engine and a directory

A search engine like Google or Hotbot lets you seek out specific words and phrases in Web pages. A directory is more like a subject index in the library—a human being has determined the main point of a Web page and has categorized it based on a classification scheme of topics and subtopics used by that directory. Some examples of directories are Yahoo! and the Internet Public Library. Many of the search engines have also developed browsable directories, and most of the directories also have a search engine, so the distinction between them is blurring.

See the links to Web directories and to sources which have done Evaluations of the various features of Web directories and search engines.

Consult the reference librarian for advice

Reference librarians can often be of great help in planning your internet research. Just as they know their library's collection, they probably have done a lot of research on the internet and know its resources pretty well. They're also skilled at constructing search terms and using search engines, and they're trained to teach others how to search.

Learn about search syntax and professional search techniques

To be successful at any kind of online searching, you need to know something about how computer searching works. At this time, much of the burden is on the user to intelligently construct a search strategy, taking into account the peculiarities of the particular database and search software. The section on Skills for online searching will help.

Learn some essential browser skills

Know how to use your browser for finding your way around, finding your way back to places you've been before and for "note-taking" as you gather information for your paper. A large part of effective research on the Web is figuring out how to stay on track and not waste time—the "browsing" and "surfing" metaphors are fine for leisure time spent on the Web, but not when you're under time pressure to finish your research paper. Lots of colleges have Netscape tutorials - see Web and internet tutorials for links which will supplement the information below.

URLs

Understand the construction of a URL.

Sometimes a hyperlink will take you to a URL such as http://www.sampleurl.com/files/howto.html. You should know that the page "howto.html" is part of a site called "www.sampleurl.com." If this page turns out to be a "not found" error, or doesn't have a link to the site's home page, you can try typing in the location box "http://www.sampleurl.com/" or "http://www.sampleurl.com/files/" to see if you can find a menu or table of contents. Sometimes a file has been moved or its name has changed, but the site itself still has content useful to you—this is a way to find out.

If there's a tilde (~) in the URL, you're probably looking at someone's personal page on a larger site. For example "http://www.bigsite.com/~jonesj/home.html" refers to a page at www.bigsite.com where J. Jones has some server space in which to post Web pages.

Navigation

Be sure you can use your browser's "Go" list, "History" list, "Back" button and "Location" box where the URL can be typed in. In Web research, you're constantly following links through to other pages then wanting to jump back a few steps to start off in a different direction. If you're using a computer at home rather than sharing one at school, check the settings in your "Cache" or "History list" to see how long the places you've visited will be retained in history. This will determine how long the links will show as having been visited before (i.e, purple in Netscape, green in our site). Usually, you want to set this period of time to cover the full time frame of your research project so you'll be able to tell which Web sites you've been to before.

Bookmarks or favorites

Before you start a research session, make a new folder in your bookmarks or favorites area and set that folder as the one to receive new bookmark additions. You might name it with the current date, so you later can identify in which research session the bookmarks were made. Remember you can make a bookmark for a page you haven't yet visited by holding the mouse over the link and getting the popup menu (by either pressing the mouse button or right clicking, depending on what flavor computer you have) to "Add bookmark" or "Add to favorites." Before you sign off your research session, go back and weed out any bookmarks which turned out to be uninteresting so you don't have a bunch of irrelevant material to deal with later. Later you can move these bookmarks around into different folders as you organize information for writing your paper—find out how to do that in your browser.

Printing from the browser

Sometimes you'll want to print information from a Web site. The main thing to remember is to make sure the Page Setup is set to print out the page title, URL, and the date. You'll be unable to use the material if you can't remember later where it came from.

"Saving as" a file

Know how to temporarily save the contents of a Web page as a file on your hard drive or a floppy disk and later open it in your browser by using the "file open" feature. You can save the page you're currently viewing or one which is hyperlinked from that page, from the "File" menu or the popup menu accessed by the mouse held over the hyperlink.

Copying and pasting to a word processor

You can take quotes from Web pages by opening up a word processing document and keeping it open while you use your browser. When you find text you want to save, drag the mouse over it and "copy" it, then open up your word processing document and "paste" it. Be sure to also copy and paste the URL and page title, and to record the date, so you know where the information came from.

Be prepared to cite your Web references

Find out what form of bibliographic references your instructor requires. Both the MLA and APA bibliographic formats have developed rules for citing sources on CD-ROM and the internet. Instructions for Citing Electronic Sources are linked from the Internet Public Library.

Source:  http://www.ipl.org/div/aplus/internet.htm

Categorized in Research Methods

The following are a few of the techniques and tools I use to make my Google searching more effective or more productive.

Synonym Searching

Google has a limit of 10 words per search [since expanded to 32], which can make it difficult to include all the possible variations on a word. For example, a search for reports on childhood obesity should probably also include the words child, children, kid, kids, youth and family as well as childhood, and the words obese, overweight and fat as well as obesity. Oops! That adds up to 11 possible search terms, and doesn't give you any leeway to include filetype: limitations or other words to narrow the search down to reports. One way to circumvent this limitation is to try Google's synonym search. Add a tilde (~) at the beginning of the words child and obese (~child ~obese), and Google retrieves web sites that use any of those synonyms.

A slider bar lets you specify how much you want the search results sorted by those interests you specified.

Note that this tool works best for common words, and some of the synonyms may be broader than you wish. I needed to search for web sites of elementary school bands, music departments and choirs. I tried a search for ~music, but saw that I was also getting web sites with the words rock, MP3, radio, audio, song, sound, and records -- not really what I had in mind.

Google Personalized

Personalized Google is still in beta, but it's an interesting tool. Once you go to the Google Labs page and select Personalized, you will be sent to a new search page, that includes a link to [Create Profile]. You can specify the type of searching you typically do, ranging from biotech and pharmaceuticals to dentistry to classical music. Click [Save Preferences], and then type your search terms in the Google Personalized search box.

At the search results screen, you will now see something new -- a slider bar that lets you specify how much you want the search results sorted by those interests you specified. The default is minimal personalization; move the slider bar toward maximum, and you will see the search results change on the fly, as Google re-ranks the results based on your personal interests.

Keep in mind that this personalization is only available through the Personalized Google page. If you go to the main Google search page, the personalization option is not available.

Google Shortcuts

As with other search engines, Google has some built-in "answer" features that can sometimes come in handy.

If you type the word "define:" and a word (define:card for example), instead of the usual search results, you will get definitions of that word from a wide range of glossaries, dictionaries and lexicons.

Type a US company's name or stock symbol in the search box, and the first item in the search results page will be a link to current stock quotes for that company, provided by Yahoo Finance.

Type a US area code in the search box, and the first search result will link to a map showing the general coverage area of that area code. I find this particularly useful now that there are over 200 area codes.

See www.google.com/help/features.html for a list of Google's shortcuts.

Specialized Searches

In addition to the well-known Google search tabs for searching the web, news and images, there are several specialized search tools for commonly-search subjects, including UncleSam for searching federal government information; University Search for searching within the sites of major colleges or universities; and even Google Microsoft, for searching Microsoft-related sites.

Source: http://archive.virtualchase.justia.com/articles/archive

Categorized in Search Engine

Several weeks ago, a legal advocacy group issued a press release, which informed about the organization's efforts on behalf of teenage girls who had been abused in a detention center. It referred readers to a redacted document on its Web site for more information.

As the mother of a teenage girl, it sparked my interest. I displayed the redacted PDF document, and then examined its security. Since I was able to discover the names of the girls, I informed the group, who quickly corrected the flawed document.

But what if my motives were not that of a curious and outraged parent?

Stories about improperly redacted documents appear frequently in the news and legal literature. Often, those who discover the redacted information expose it. But the motives of researchers run the gamut from mild curiosity to winning at all costs. Thus, while exposure might not be desirable, use of the information without the creator's knowledge or consent could be worse.

As was the case in this example, such findings often involve serendipity. But luck isn't always a factor. Strategy plays a major role in certain types of research; for instance, competitive intelligence. It behooves companies to learn about these techniques in order to protect their confidential information.

Private - Keep Out!

When researchers want to know something about a company, one of the first places they check is its Web site. They read what the company wants them to know. Then, if they want to dig deeper continuing to use the company itself as a source, they check two things: the Internet Archive and the Web site robots exclusion file (robots.txt).

The former archives previous versions of the site. As I relate in an earlier article, these sometimes shed light on information the company might not want to reveal.

Because of improved security at Web sites, robots exclusion files generally are not as helpful as they used to be. But researchers still check them, and so should you.

The files contain commands that instruct search engines about areas of the site they should not index. Any legitimate search engine will obey these commands.

To work correctly, the file must appear in the root directory of the Web site. It must bear the filename, robots.txt. Therefore, to find it, you enter: http://www.domain.com/robots.txt.

They are easy to read. The one on The Virtual Chase looks, in part, like this:

user-agent: *
disallow: /_private/
disallow: /cas/
disallow: /cir/
disallow: /data/

The user-agent is the targeted crawler (search engine). The asterisk is a wildcard. Each character string following the command, disallow, is a subdirectory. Consequently, this abbreviated set of commands tells all search engines not to crawl the subdirectories labeled, _private, cas, cir and data. A researcher, of course, will choose to attempt entry, or not.

It's like placing a Keep Out sign on a door. If the door isn't locked, someone may walk through it.

Careless Clues

As I explain in the above-referenced article on the Internet Archive, a prospective client approached a group of my firm's lawyers about launching a new business in an industry with an unsavory reputation. One of the conditions for considering representation was that the woman not have prior dealings in the industry. She claimed she did not.

Research at the client's business Web site in the Internet Archive, however, uncovered circumstantial evidence of several connections. Through telephone research and public records, we were able to verify that not only was she working in the industry, she was the subject of a then-active federal criminal investigation.

Clues about information you would rather researchers not discover often come from the company itself. In a recent and widely publicized example, Google inadvertently released information about its finances and future product plans in a PowerPoint presentation.

Searching for Microsoft Office files is, in fact, an expert research strategy because the meta data often reveals information the producer did not intend to share. You may tack on a qualifier or use a search engine's advanced search page to limit results to specific file types, such as Word documents (doc), PowerPoint presentations (ppt) or Excel spreadsheets (xls).

At Google, the qualifier is filetype: whereas at Yahoo it is originurlextension:. Enter the file extension immediately after the colon (no spaces). Check each search engine's help documentation for the appropriate qualifier, or consult a Web site, such as Search Engine Showdown, which tracks and informs about such commands.

Searching certain phrases sometimes produces intriguing results. Try the phrases below individually to discover the potential for this technique when coupled with a company, organization or agency name:

"not for public dissemination"

"not for public release"

"official use only" (variations include FOUO and U//FOUO)

"company confidential"

"internal use only"

You might find additional ideas for searching dirty in the Google Hacking Database.

Copyright Underground

Book search engines, such as Amazon.com's Search-Inside-This-Book, Google Book Search and the Text Archive at the Internet Archive, are becoming increasingly valuable in research. If you uncover even a snippet of relevant information online, it may save you valuable research time offline.

One of my recent success stories involves finding an entire chapter on the target company in a book published just a few months prior to the research. Of course, I was unable to read the book online. I had to purchase it. But the tools helped me find what I might have missed without them.

However, this is not the underground to which I refer. By using these tools, you are not skirting the process for rewarding those who wrote and published the book.

The underground, while eminently real, is not so much a place as it is a mindset - one that sets information free. The result is a mixed bag of commercial products, including books, music, digital artwork, movies and software, that have been copied or reverse engineered.

Try the search strategy below. Replace the phrase, Harry Potter, with the keywords of your choice:

"index of" "last modified size description" "parent directory" "harry potter"

The portion of the search statement preceding "harry potter" comprises a strategy for finding vulnerable Web sites or servers. In a nutshell, it commands the search engine to return matches to directory structures instead of single Web pages. If a Web site is properly secured, the search engine will be unable to provide this information.

To some extent, you can monitor the availability of files that comprise unauthorized copies of products

by setting up what Tara Calishain calls information traps. Tara's excellent book on Information Trapping provides many examples of ways to monitor new information.

One possibility is to use the above search strategy for best-selling or popular products, and then set up a Google Alert for new matches to each query.

While you should monitor hits at other search engines besides Google, doing so requires more work. First, test and perfect the query so that you are retrieving useful results. Set the search engine preferences to retrieve 100 items per page. Then copy the URL when the search results display. Paste it into a page monitor, such as Website-Watcher or TrackEngine. The tracking software or Web service will monitor changes in the first 100 search results. You may opt to have it send the changes to you by e-mail.

Companies and other organizations that want to protect proprietary or confidential information should conduct this type of research with regularity. You can expedite some of the search process with information traps. But considering the stakes, regular thorough searching is a worthwhile investment.

Source:  http://archive.virtualchase.justia.com/articles

Categorized in Research Methods

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media

Book Your Seat for Webinar GET FREE REGISTRATION FOR MEMBERS ONLY      Register Now