The internet is humongous. Finding what you need means that you should select from amongst millions and sometimes trillions of search results. However, no one can claim for sure that you have found the right information. Is the information reliable and accurate? Or would you have to shop for another set of information that is even better? Or say, relevant to the query – While the Internet keeps growing every single minute, the clutter makes it even harder to catch up with, and perhaps, a more valuable information keeps getting buried underneath it. Unfortunately, the larger the internet grows, it gets harder to find what you need.

Think of search engines and its browsers to be a set of information search tools that will fetch what you need from the Internet. But, a tool is as good as the job it gets done. While, Google, Bing, Yahoo and the like are considered a more generic tool for Internet search, they perform a “fit all search types job”. The search results throw tons of web pages at you and thus, much harder selections and surely less accuracy. 

A simple solution to deal with too much information on the Internet is out there, but only if you care to pay attention – here is a List of Over 1500 Search Engines and Directories to cut your research time in half.

There exists a whole new world of Internet search tools that are job specific and finds that information you need through filtered and precision search. They subscribe to the same world wide web and look through the same web pages as the main search engines, but only better. These search tools are split up into Specialized Search Engines and Online Directories.

The Specialized Search Engines are built to drill down into a more accurate type of information. They can collect a filtered and less cluttered search results when compared to the leading search engines such as Google, Bing, Yahoo. What makes them unique is their built-in ability to use powerful customized filters, and sometimes it has its database to deliver the type of information you need in specific file formats.

Advanced Research Method

We will classify Specialized Search Engines into Meta-crawlers (or Meta-SearchEngine) and the Specialized

Content SearchEngine

Unlike conventional search engines, the Meta-crawlers don’t crawl the web themselves, and they do not build their own web page indexes; instead, they allow search snippets to be collected (aggregated) from several mainstream search engines (Google, Bing, Yahoo and similar) all at once. They don't have their proprietary search technology or the large and expensive infrastructure as the main search engines do. The Meta-crawler aggregates the results and displays these on their proprietary search result pages. In short, they usually concentrate on front-end technologies such as user interface experience and novel ways of displaying the information. They generate revenues by displaying ads and provide the user option to search for images, audio, video, news and even more options, simulating a typical search browsing experience.

Some of the well-known Meta-Crawlers to explore.

  • Ixquick  -  A meta-search engine with options for choosing what the results should be based on? - It respects the information privacy, and the results get opened in Ixquick proxy window.
  • Google - Considered the first stop by many Web searchers. Has a large index and results are known for their high relevancy. Includes ability to search for images, and products, among other features.
  • Bing- General web search engine from Microsoft.
  • Google Scholar - One of Google's specialized search tools, Google Scholar focuses primarily on information from scholarly and peer-reviewed sources. By using the Scholar Preferences page, you can link back to URI's subscriptions for access to many otherwise fee-based articles.
  • DuckDuckGoA general search engine with a focus on user privacy.
  • Yahoo!A combination search engine and human-compiled directory, Yahoo also allows you to search for images, Yellow Page listings, and products.
  • Internet Public LibraryA collection of carefully selected and arranged Internet reference sources, online texts, online periodicals, and library-related links. Includes IPL original resources such as Associations on the Net, the Online Literary Criticism Collection, POTUS: Presidents of the United States, and Stately Knowledge (facts about the states).
  • URI Libraries' Internet ResourcesThis is a collection of links collected and maintained by the URI librarians. It is arranged by subject, like our online databases, and provides access to free internet resources to enhance your learning and research options.
  • Carrot Search   A meta-search engine based on a variety of search engines. Has clickable topic links and diagrams to narrow down search results.
  • iBoogie  -  A meta-search engine with customizable search type tabs. Search rankings have an emphasis on clusters.
  • iSeek  – The meta-search results are from a compilation of authoritative resources from university, government, and established non-commercial providers.
  • PDF Search Engine  – Searches for documents with the following extensions such as, .doc, .pdf, .chm, rtf, .txt.

The Specialized Content Search Engine focuses on a specific segment of online content; that is why they are also called a Topical (Subject Specific) Search Engines. The content area may is based on topicality, media, and content type or genre of content – further to this, the source of material and the original function it performs in transforming it, is what defines their specialty.

We can go a bit further and split these into three groups.

Information Contribution – The information source can be data collected from a Public Contribution Resource Engines as social media contributions and from reference platform such as Wikis. Examples are YouTube, Vimeo, Linked-in, Facebook, Reddit. The other types are a Private Contribution Resource Engines of the searchable database. These are created internally by the efforts of the search engine vendors; examples are Netflix (movies), Reuters (news content), Tineye(image repository), LexisNexis (legal information).

Specialized Function - These are the search engines that are programmed to perform a type of service that is proprietary and unique. They execute tasks that involve collecting web content as information and work on it with algorithms of their own, adding value to the result it produces.

An example of such types of search engines are websites such as the Wayback Machine Organization that provides and maintain records of website pages that are no longer available online as a historical record. Alexa Analytics that performs web analytics and measures traffic on websites and provide performance metrics and Alpha Wolfram who is more than a search engine. It gives you access to the world's facts and data and calculates answers across a range of topics.

Information Category (Subject Specific material) - This is where the search is subject specific and based on the information it retrieves. It does this by a special arrangement with outside sources on a consistent basis. Some of their examples are found under the broader headings.

  • Yellow Pages and phone directories
  • PeopleSearch
  • Government Database and archives
  • Public libraries
  • News Bureaus, Online Journals, and magazines
  • international organizations

web directory or Link Directory is a well-organized catalog on the World Wide Web. A collection of data organized into categories and subcategories. This directory specializes in linking to other web sites and categorizing those links. The web directory is not a search engine, and it does not show numerous web pages from the keyword search. Instead, it exhibits a list of website links according to category and subcategory. Most web directory entries are not commonly found by web crawlers. Instead, they are searched by humans. This categorization encompasses the whole website instead of a single page or a set of keywords; here the websites are often limited to inclusion in only a few categories. Web directories often allow site owners to submit their site for listing and have editors review submissions for its fitness.


The directories are distinguished into two broad categories.

Public Directories that do not require user registration or fee; and the Private Directories with an online registration that may or may not be subject to a fee for inclusions in its listings. Examples of Paid Commercial Versions.

The Public Directories is for General Topics, or it can be Subject Based or Domain-Specific Public Directories.

The General Topics Directory carry popular reference subjects, interests, content domains and their subcategories. Their examples are, DMOZ  (The largest directory of the Web. The open content is mirrored at many sites, including the Google Directory (until July 20, 2011). The A1 Web Directory Organization (This is a general web directory that lists various quality sites under traditional categories and relevant subcategories). The PHPLink Directory  ( A Free Directory Script phpLD is released to the public as a free directory script in 2006, and they continue to offer this as the free download).

The Subject Based or Domain-Specific Public Directories are subject and topic focused. A more famous of these are Hot Frog (a commercial web directory providing websites categorized topically and regionally). The Librarians Index to Internet (directory listing program from the Library of California) and OpenDOAR  (This is an authoritative directory of academic open access repositories).

The PrivateDirectories requires online registration and may be subject to a fee for inclusions in its listings.

Examples of Paid Commercial Versions.

  • Starting Point Directory - $99/Yr
  • Yelp Business Directory - $100/Yr
  • Manta.com - $299/Yr

The Directories that require registration as a member, employee, student or a subscriber.  Examples of these types are found in.

  • Government Employees Websites (Government Secure Portals)
  • Library Networks (Private, Public and Local Libraries)
  • Bureaus, Public Records Access, Legal Documents, Courts Data, Medical Records

The Association of Internet Research Specialists (AIRS) have compiled a comprehensive list they call an "Internet Information Resources." There you will find an extensive collection of Search Engines and interesting information resources for avid Internet research enthusiasts; especially, those that seek serious information without the hassle of sifting through the many pages of the unfiltered Internet. Alternatively, one can search through Phil Bradley’s website or The Search Engine’s List that has some interesting links for the many alternatives to typical search engines out there. 

 Author: Naveed Manzoor [Toronto, Ontario] 

Categorized in Online Research

 Are you a liar? You bet you are but the real you is emerging through your online activities. What Big Data knows about the real you.

There are things about which we all lie. We lie about our innermost hopes, fears, and desires. We lie to our friends, spouses, doctors, pollsters, even to ourselves. But our truth is being discovered because we willingly reveal it every day through our activities online.

It’s all being tracked and through big data, a new picture about us is emerging which contradicts much of what we previously believed about each other.

Seth Stephens-Davidowitz, author of Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

New York Times op-ed writer Seth Stephens-Davidowitz has studied this and reveals it all in his book, Everybody Lies. He says that data from the internet is like "digital truth serum," revealing how we really behave when no one's watching. Mike Collins talks with him about some of his findings.

Some highlights from the show:

On the events in Charlottesville

“There was a period when we thought we lived in a post-racism society. You could see online, even when people were telling pollsters politically correct truths, like they weren’t racists and didn’t care that Barack Obama was black, online they were telling a different story. They were making racists searches, usually for jokes mocking African-Americans with a shocking frequency. Stormfront is the biggest, most popular hate site in the United States. The demographics are young people, which you also saw at the events in Charlottesville. Neo-Nazis exist and I’ve known this for years because of internet research. Young people were becoming obsessed with neo-Nazis and the clear cause of it was Barack Obama.” 

For young adults age 19 to 21

“It’s a very impressionable group. It’s not a stupid or uneducated group. The most popular interest for Stormfront members is reading. They’re obsessed with philosophers, evolution and they’re political junkies. Many people say they join Stormfront because of a dating experience. Perhaps an African-American dated someone they wanted to date and it created this rage that led them to this material.”

Google reveals the most about the human psyche

“We’re in a habit of lying to make ourselves look better. That carries over to surveys, there’s no incentive to tell the truth. With Google there’s an incentive, you tell the truth, you get information you need. People are lying to surveys saying they aren’t racists. Compare that to the Google searches. It’s so clear there’s a very different truth about society that was being missed by the traditional way of understanding people. ”

“Google trends compare the rates of searches to different parts of the United States or world and when these searches are highest. You can learn interesting patterns. Anxiety has doubled in the last five years. It’s highest in Kentucky, Maine and rural areas. The recent rise in anxiety and panic attacks almost perfectly track rises in searches related to opioids.”

Guest Seth Stephens-Davidowitz - New York Times op-ed contributor, visiting lecturer at The Wharton School, and a former Google data scientist. He is the author ofEverybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Read an excerpt from Everybody Lies. 
"The power in Google data is that people tell the giant search engine things they might not tell anyone else. Google was invented so that people could learn about the world, not so researchers could learn about people, but it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing."

Seth on NPR's Hidden Brain podcast: What Our Google Searches Reveal About Who We Really Are
"I think there's something very comforting about that little white box that people feel very comfortable telling things that they may not tell anybody else about their sexual interests, their health problems, their insecurities. And using this anonymous aggregate data, we can learn a lot more about people than we've really ever known."

Source: This article was published wfae.org By ERIN KEEVER

Categorized in Internet Technology

Internet access is so fundamental that it is starting to be considered a basic human right. However, access to the internet remains uneven. As more devices connect to the internet, more bandwidth is eaten up. Li-Fi is a new route to connectivity that will provide more bandwidth and speed once the technology is completely developed — and it’s very close.


Li-Fi uses an LED bulb’s modulated light signal instead of a modulated radio signal to send data and connect to the internet. The LiFi-X system from PureLiFi transmits data using waves in the visible portion of the electromagnetic spectrum that an LED bulb with a microchip generates. The LED light fixture and a dongle for a USB port comprise the LiFi-X system which delivers speeds of up to 42Mbps, up and down. The system is already in use as its parent company, PureLiFi, has been collaborating with tech companies around the world to trial and improve the technology.


A Li-Fi system offers a business many advantages, including improved security. Sending and receiving data through light means that access can be limited much more easily than with Wi-Fi because light does not penetrate walls. On the other hand, this also presents a challenge in terms of making Li-Fi as convenient as Wi-Fi. Smart architecture will be required to increase Li-Fi’s range, and dim LEDs will make it possible to have Li-Fi access that follows users and works even in the dark.

Li-Fi can also be applied in settings that are impossible for Wi-Fi. For example, Li-Fi is ideal for in-flight internet access and high security installations like petrochemical plants in which risk of sparks makes radio antennas too dangerous to be used.

The equipment for Li-Fi is too big to be used in mobile devices — but perhaps only for the moment. Miniaturizing the technology is one of the biggest goals for PureLiFi and, according to Digital Trends, a newly redesigned LiFi-X with a much smaller dongle is coming later this year to use for laptops. This version is still too large to fit into a smartphone, but since “LiFiCapability” language was found in iOS code for a future iPhone model, it seems likely that the smartphone version is coming.



Consumer demand for wireless data is pressuring existing Wi-Fi technology more every day. The ongoing, exponentially growing number of mobile devices in particular is expected to reach 11.6 billion by 2021 — exceeding the projected population of the planet at that time (7.8 billion). This translates into a monthly information level of about 35 quintillion (1018) bytes — a level that will be unsustainable with current wireless infrastructure and technology.

Li-Fi can relieve this pressure because the visible light frequencies it uses are relatively underutilized. PureLifi and other companies working to develop the technology are already partnering with businesses in the lighting industry to grow the lighting ecosystem now so that, hopefully, by the time Li-Fi tech is ready to go online at scale the infrastructure it needs will be ready.

In February 2016, Li-Fi technology sent data at up to 1GB per second in trials, which is 100 times faster than currently available Wi-Fi technology. These trial runs were obviously slower than the lab tests, but they demonstrated that Li-Fi connections should be able to transmit up to 224 gigabits per second. By August, researchers were sending data 20 times faster than they did in February. Speeds are expected to continue to improve.

New smartphone and computer designs could incorporate this technology, perhaps in doubly innovative ways. For example, Li-Fi connectivity cells might also provide an opportunity for solar charging capabilities in smart devices. And, while it is unlikely that Li-Fi will entirely replace Wi-Fi, it will almost surely become the exclusive source of data transmission in high security areas, on planes, or in older buildings that disrupt Wi-Fi signals.

Source : malaysiandigest.com

Categorized in Search Engine

When I think about the behavior of many business people today, I imagine a breadline. These employees are the data-poor, waiting around at the end of the day on the data breadline. The overtaxed data analyst team prioritizes work for the company executives, and everyone else must be served later. An employee might have a hundred different questions about his job. How satisfied are my customers? How efficient is our sales process? How is my marketing campaign faring?

These data breadlines cause three problems present in most teams and businesses today. First, employees must wait quite a while to receive the data they need to decide how to move forward, slowing the progress of the company. Second, these protracted wait times abrade the patience of teams and encourage teams to decide without data. Third, data breadlines inhibit the data team from achieving its full potential.

Once an employee has been patient enough to reach the front of the data breadline, he gets to ask the data analyst team to help him answer his question. Companies maintain thousands of databases, each with hundreds of tables and billions of individual data points. In addition to producing data, the already overloaded data teams must translate the panoply of figures into something more digestible for the rest of the company, because with data, nuances matter.

The conversation bears more than a passing resemblance to one between a third-grade student and a librarian. Even expert data analysts lose their bearings sometimes, which results in slow response times and inaccurate responses to queries. Both serve to erode the company’s confidence in their data.

Overly delayed by the strapped data team and unable to access the data they need from the data supply chain, enterprising individual teams create their own rogue databases. These shadow data analysts pull data from all over the company and surreptitiously stuff it into database servers under their desks. The problem with the segmented data assembly line is that errors can be introduced at any single step.

A file could be truncated when the operations team passes the data to the analyst team. The data analyst team might use an old definition of customer lifetime value. And an overly ambitious product manager might alter the data just slightly to make it look a bit more positive than it actually is. With this kind of siloed pipeline, there is no way to track how errors happen, when they happen or who committed them. In fact, the error may never be noticed. 

Data fragmentation has another insidious consequence. It incites data brawls, where people shout, yell and labor over figures that just don’t seem to align and that point to diametrically different conclusions.

Imagine two well-meaning teams, a sales team and a marketing team, both planning next year’s budget. They share an objective: to exceed the company’s bookings plan. Each team independently develops a plan, using metrics like customer lifetime value, cost of customer acquisition, payback period, sales cycle length and average contract value.

When there’s no consistency in the data among teams, no one can trust each other’s point of view. So meetings like this devolve into brawls, with people arguing about data accuracy, the definition of shared metrics and the underlying sources of their two conflicting conclusions.

Imagine a world where data is put into the hands of the people who need it, when they need it, not just for Uber drivers, but for every team in every company. This is data democratization, the beautiful vision of supplying employees with self-service access to the insights they need to maximize their effectiveness. This is the world of the most innovative companies today: technology companies like Uber, Google, Facebook and many others who have re-architected their data supply chains to empower their people to move quickly and intelligently. 

Source:  http://techcrunch.com/2016/06/12/data-breadlines-and-data-brawls/

Categorized in Online Research

The use of Internet to aid research practice has become more popular in the recent years. In fact, some believe that Internet surveying and electronic data collection may revolutionize many disciplines by allowing for easier data collection, larger samples, and therefore more representative data. However, others are skeptical of its usability as well as its practical value. The paper highlights both positive and negative outcomes experienced in a number of e-research projects, focusing on several common mistakes and difficulties experienced by the authors. The discussion focuses on ethics and review board issues, recruitment and sampling techniques, technological issues and errors, and data collection, cleaning, and analysis.

1. Internet as a Research Tool

With the advancement of information and communication technology, researchers have found new methods of data collection and analysis. This has evolved from telephone surveys, computerized data analysis, and use of cell phones and pagers, to collecting information at random intervals, use of Personal Digital Assistants (or "PalmPilots"), and use of the Internet in research. Although the Internet is fast becoming a common fixture in contemporary life in many parts of the world, it remains relatively unused for primary data collection in many research fields. For example, social science research is yet to respond to the emergence of the Internet, as shown by only 494 peer reviewed articles with keywords "Internet research" published within major social science journals over the decade 1996-2006 (as per our search in the CSA Illumina® bibliographic database). Increasingly, however, the Internet is being treated as a rich source for literature and secondary data in social science research.

Until relatively recently, use of the Internet for primary data collection required the researcher to either know HTML or have someone else create a new program. Fortunately, within the past few years a number of new technological solutions and services have emerged that allow the researcher to create studies (i.e., surveys, experiments, etc.) online without needing the knowledge of computer programming. This has coincided with a large increase in studies using the Internet to collect primary data. A search in the Web of Science® bibliographic database indicates that the number of publications during the six-year period 2000-2005, using "Internet research" as keywords, is 128, which is 312 per cent higher than the corresponding figure during the six-year period prior to 2000, i.e.,1994-1999. Similar results are seen for "Internet data collection" (325 per cent), "web based research" (333 per cent), and "electronic data collection" (327 per cent). Of course, these impressive percentages are based on low base figures; Internet use in research still remains rather limited.

By its very nature, the Internet appears to be a very promising medium for researchers. As a vehicle for data collection, it promises increased sample size, greater sample diversity, easier access and convenience, lower costs and time investment, and many other appealing features. It is even possible to use the Internet for pilot testing media messages and advertisement campaigns. But without careful attention, the researcher may get into difficulties. It is the purpose of this article to expose some of the potential pitfalls awaiting the unwary researcher. Along with the potential pitfalls, solutions utilized by the authors are also discussed.


2. Manual vs. Internet-Based Data Collection

We have encountered a number of issues in our various attempts at using the Internet for primary data collection. A list of such issues must include those associated with research ethics guidelines, technical snags arising from power failures, data cleaning requirements, and low response rate. Sometimes, the experience has been so frustrating as to make manual data collection through paper-and-pencil research packets appear more attractive. However, with experience, we have learnt to be judicious in selecting the appropriate data collection method for a given research project and taking the necessary precautions if we choose to use the Internet.

Researchers, especially psychologists, have often looked at the method of data collection with regard to the impact it can have on results. The issues of questionnaire design, for example the implications of using forced choice, Likert scales, open response, or multiple response formats, are all issues much older than the Internet (Orlich, 1978; Schuman & Presser, 1981; Sudman & Bradburn, 1982). These will always be important when designing data collection instruments. The design of the instrument should be informed by the research question being addressed. Any advantages or disadvantages offered by a specific question format will not be altered by technology, but technology may introduce additional issues (Manfreda, Batagelj, & Vehovar, 2002). Each of these response types is easily available in an electronic format. Some researchers have compared manual and electronic formats, examining the issues of validity and reliability of research instruments (Berrens, Bohara, Jenkins-Smith, Silva, & Weimer, 2003; Schilewaert & Meulemeester, 2005; Sethuraman, Kerin, & Cron, 2005). They have found test-retest reliabilities for both formats to be nearly equal, indicating that both formats can generate equally reliable data assuming that the participants are cooperative and truthful, and the questions are valid. They have also found internal consistency, predictive validity, and recruitment trends within socio-demographic categories to be comparable between the two formats. In essence, the mode of data collection (i.e., manual or electronic) does not, in itself, seem to significantly alter the type of respondent recruited or the quality of data given by the respondent.

Collecting data from people with poor reading comprehension or those not accustomed to taking paper-and-pencil tests is already known to be difficult. Similarly, while using electronic data collection methods, the respondents' lack of familiarity with computers could be an issue. In some of our survey research projects, we have compared the paper-and-pencil method with the computer based method. In our pilot tests, we have found that the computer based method was usually faster (because of the respondents' familiarity and ease with the computer keyboard and the mouse). However, during the actual data collection, the mobile laboratory had a touchpad instead of a mouse, which slowed down the respondents using the electronic version, in comparison with those who used the paper version. In short, computer skills and familiarity with the input devices affect a respondent's ability to complete an electronic survey. This is in addition to problems experienced by respondents who have poor reading comprehension or who are not comfortable with filling out questionnaires.

Another relevant difference between paper-and-pencil and electronic formats is the level of rapport possible with the respondent. The impact of such rapport may be unpredictable. For some respondents, the signed letter accompanying a paper-and-pencil format may be more persuasive than an e-mail from a stranger, commonly sent with the electronic format. It is uncertain whether face-to-face interaction with a person or the relative anonymity of the Internet would produce more authentic responses.

With identity theft (i.e., the deliberate assumption of another person's identity without the latter's knowledge) being a major issue of current concern, Internet data collection may not seem as legitimate as data collected in a community center or a university laboratory. Internet data collection could indeed be problematic from the point of view of source credibility--an important issue in persuasive communication, as research in the area of persuasion indicates (Hong, 2006; Hovland & Weiss, 1951; Olson & Cal, 1984). Additionally, as the psychologist Stanley Milgram (1974) argues, people are more likely to obey an authority that is present in the room compared to one that is in the next room or on the phone. Accordingly, the manual paper-and-pencil method can be expected to produce higher-quality data compared to the Internet-based method, the former being more tangible, more personal, and in short, more credible to the respondents, especially if the research staff is in the room with them (Nosek, Banaji, & Greenwald, 2002).

On the more positive side, Internet-based data collection, if utilized properly, can reduce costs and make unfunded projects feasible, yield larger and more representative samples, and obviate hundreds of hours of data entry. Table 1 compares the advantages and disadvantages of manual and online modes of data collection.

Table 1. Comparison between Manual and Internet-Based Data Collection*


Internet is a tool that is out there, for better or for worse. Its usefulness in research is largely dependent on its judicious use. As depicted in Figure 1, a series of questions pertaining to different stages of the research project need to be answered before making a final choice regarding the data collection format. In this figure, the solid lines represent the progression of the decision making process concerning the use of electronic data collection. The broken lines lead to the likely decision, with the lines on the right representing a negative answer to the question posed at each stage (thus favoring manual data collection) and the lines on the left representing a positive answer (thus favoring electronic data collection).


Figure 1. Considerations for Incorporating Internet-Based Data Collection in a Research Project

3. Research Ethics

The Institutional Review Board (IRB) is the US version of the research ethics committees created in many universities and other research institutions in response to the rising concerns about both human and animal use in research. The IRB's role is to oversee research being conducted within an institution in an attempt to ensure that participants' rights and privilege are being upheld. In the United States, IRBs generally focus on the principles laid out in the Belmont Report (1978). When considering whether or not a specific research project should be allowed to be completed, IRB reviewers focus on three key principles: (a) beneficence (i.e., lack of harm and/or received benefit), (b) respect for persons (i.e., confidentiality and ability to withdraw from research), and (c) justice (i.e., opportunity for all participants to benefit from outcome). In essence, the IRB serves as the research participants' informed and trained advocate.

Some IRB members may have some special concerns when dealing with proposals involving primary data collection via the Internet (Naglieri et al., 2004; Nosek, Banaji, & Greenwald, 2002). Anonymity and confidentiality are always concerns in data collection, but the potential for recording the IP (Internet Protocol) addresses, thereby the identity of the remote computers, makes Internet-based proposals more complicated (Berry, 2004). Other issues, such as data security during transmission, are unique to Internet-based data collection. Some common IRB issues the authors have encountered are discussed in the following paragraphs.

Primary data collection via the Internet presents a unique issue during data transmission (Hewson, Laurent, & Vogel, 1996). The data are most susceptible to hacking, corruption, etc., while these are being transferred from the respondents' computers to the researchers' computer. One relatively easy method of limiting these possibilities is the encryption of data during transmission. Data encryption may be accomplished through various methods, but from the IRB viewpoint, the method of encryption appears to be of less importance than the fact that encryption is being done. Of course, providing for data encryption can add to the cost of the project.

Irrespective of the mode of data collection, physical security of data is a major issue once data have been collected. With Internet-based data collection, physical security includes much more than a locked file cabinet in a secure room. Consideration must be given to both physical and electronic security of the server where data are stored. Physical security of the server should minimally include a room with restricted access. Internet data collection can be facilitated by numerous agencies that specialize in allowing researchers to create their own study. These agencies often provide adequate physical security. One physical security measure that may be overlooked is environmental controls that regulate temperature, humidity, and air flow. Environmental controls are particularly relevant for electronic data. Papers locked in a file cabinet will not be affected by a 105 degree Fahrenheit temperature, but this may cause problems with computer hard-drives. These extensive safeguards may not be necessary depending on the IRB, but having them will provide peace of mind for researchers and IRB members alike. Electronic security begins with the encryption process described above; it does not, however, end there. It would be necessary for the server to have firewalls. Firewalls protect the server from unauthorized electronic entry (i.e., hacking). Other electronic security commonly includes the use of passwords, PIN codes, and access codes.

When conducting Internet surveys, there is a potential threat to anonymity of the respondent that needs to be considered (Pittenger, 2003; Waern, 2001). It is possible for a computer program to record the IP address of the computer being used by the respondent. The IP address is a numerical code that is unique to each computer connected to the Internet. It is also possible to record the time when the data were entered. These capabilities mean that the actual respondents can be traced out in many cases. We have dealt with this issue by either deleting the IP addresses from the dataset early in the cleaning process or electing to not record the IP addresses, wherever possible. As an interesting aside, IP addresses collected from personal computers may be useful for matching sets of longitudinal data without collecting specific identifiers or using matched lists of identities and participant codes. In this case, recording IP address is an advantage--not an ethical liability. However, IRBs should be made aware that this is the intent behind recording IP addresses in such a case.

Research involving persons requires some form of informed consent, wherein the persons agree to participate and acknowledge the risks, benefits, and their rights. This can take the form of a verbal consent or a written one. In both verbal and written consents it is ascertainable whether the person providing the consent is indeed the person participating in the research. With Internet-based data collection this is not possible, as there is no visual reference (Pittenger, 2003). Additionally, it is not possible to determine that the person providing the consent meets the inclusion or exclusion criteria, as may be specified by the researcher. Thus, the issue of consent for Internet-based data collection includes issues of the respondent's personal integrity. Commonly, the consent to participate in Internet surveys takes the form of either choosing a box on the screen and pressing a button or choosing the "agree" option. Some IRBs may not consider this to be true informed consent, viewing it simply as the respondent's acknowledgement of reading the page. Since verifying this is next to impossible, some version of a "waiver of consent" becomes appropriate before conducting Internet-based data collection. This is especially relevant considering the possibility of respondents being minors without parental consent. Seeking and securing waivers from IRB for both parental and individual consent has been our approach to avoid subsequent disputes regarding consent, acknowledgement, and participation by minors.

One of the usual conditions of informed consent is that withdrawal from participation or refusal to participate cannot invalidate incentives. In Internet surveys with incentives provided this means that in the event of refusal to participate or early exit from the survey, the participant must be routed to the page meant for debriefing and incentive enrollment. Clearly, this is not perfect as the participant may simply close their Web browser to exit, rather than choose a button marked "exit survey." There is no simple and effective way to ensure that this does not happen and participants always have access to the incentives they are entitled to.

Other considerations that must be weighed are issues of burden and beneficence. Does using the Internet constitute an undue burden on a specific population, for example, computer illiterate individuals? There is no easy answer to this and it may in part depend on the subject matter being researched. Similarly, if participants receive benefits from being involved in the research, are these benefits available to non-computer users? These are difficult questions that each IRB would view differently; however, the best answer is that it depends on the research being conducted and the population being targeted for data collection. Our practice has been to anticipate these issues and, when applicable, justify the decisions in the design of the survey. Open communication with the IRB representative has helped us avoid unforeseen issues, thus leading to faster, more efficient approval processes.

4. Recruitment of Respondents

The Internet appears to be a mechanism to access the most representative participant pool in the world. Because of this, consumer researchers and marketing firms have created dedicated websites and electronic mailing lists designed to send out surveys to the willing public (e.g., NPD Online Research). However, it may not be correct to assume that recruitment of respondents in a virtual setting must be easy. We have utilized a variety of recruitment techniques and learned that, (a) different recruitment procedures can have different effects on the resulting sample and (b) the right recruitment procedure, with some luck, can yield interestingly large samples for the study.

Issues of recruitment have been widely discussed in the context of survey research (Cochran, 1977; McCready, 1996; Rosnow & Rosenthal, 2005; Sudman, 1983). Some of the recruitment methods are discipline-specific while others are more general. Most of these methods can be applied in an Internet-based project with simple alterations (Andrews, Nonnecke, & Preece, 2003; Hewson, Laurent, & Vogel, 1996; Koo & Skinner, 2005; Schillewaert & Meulemeester, 2005). For example, psychologists often utilize student pools from psychology classes--a convenience sample, while sociologists are usually more purposive in trying to sample groups meeting certain criteria (e.g., low-income minorities). Researchers using the Internet can recruit these same groups by either mass e-mailing the survey to the target group or sending out the survey Web site link to community leaders or organizations that interact with the target group.

If an electronic survey is being used simply to speed up data entry and analysis, the common method involving a group of participants meeting at a specified location and time can be used, with the provision of computers at the desired location. In this case, the recruitment procedure would be based on the accessibility of the population being sampled. Of course, the benefit of speedy data entry needs to be weighed against the risks associated with technology and those involved in data preparation processes (see Sections 5 and 6).

Despite the potential participant pool of hundreds of millions, the actual number of respondents in an Internet survey can be quite low (Zhang, 2000). In fact, response rates can be dismal enough to make the time-honored mail-in surveys seem more attractive. Using four of our Internet surveys as a basis, we have presented a discussion of the recruitment techniques which worked for us and those which did not. Our experience indicates the prudence in following multiple recruitment strategies in any project. Moreover, strategies that worked before the Internet generally also work with the Internet.

In a project concerning health behaviors and activity, designed to survey college students, all 25,000 students on a college campus were e-mailed the Web link to the 15-page survey containing several validated and time-tested scales along with an explanation of the study and the opportunity to win prizes. A second e-mail was sent out two weeks later with a reminder and the link. One month after the original recruitment e-mail we had only 509 respondents (i.e., 2 per cent response rate). The inclusion of paper reminders placed in dormitory mailboxes increased participation within freshmen to 5 per cent, which was about 2 per cent prior to this.

A second project of ours with severe recruitment woes involved an attempt to get a community sample of driving behaviors within six cities in three states. The original recruitment procedure involved placing 600 paper leaflets or flyers per community (N=3,600) on vehicles parked in public parking lots during business hours. The flyers contained information about the study employing several persuasion tactics, the link to the Internet survey, and the contact information for the researchers--should a potential respondent have any questions or need help accessing the survey. Because of an inability to give reminders and the need for the respondent to manually enter the Web address, we planned on a 90 per cent non-response rate in order to get 60 participants per community (n=360). After 1,200 flyers distributed in two cities and one month of waiting, five respondents had attempted the online survey with only two finishing it in entirety. Interestingly enough, two respondents had accessed the survey the day before recruitment leaflets were sent out, which indicates that perhaps the IRB members were checking on the link and survey materials.


Not discouraged by a 0.5 per cent response rate, we adopted a snowball sampling technique in which we sent the survey out to friends, family, and colleagues. This recruitment e-mail contained study information, the link to the survey, and instructions to forward the e-mail to friends, family, and colleagues. Using this approach, the 60 initial e-mails yielded three times as many responses (189 to be precise) within the first month. Follow-up information seems to indicate that the snowballing process stopped at the third or fourth iteration. While this technique did yield higher response rates, it did not allow for community-specific analyses to be conducted because the e-mail contacts were distributed in other cities. It did however provide a broad sample with several professions and ages being represented. As an interesting aside, the e-mail sent out by one of the authors reached the other author, at the fourth iteration of snowballing, through routes that neither could have foreseen.

A team of researchers (including one of us) interested in individual attitudes related to the loss of local wildlife also utilized the electronic method to collect data. These researchers focused on college students for their sample and recruited participants by going into a diverse range of courses and verbally recruiting students by providing them with the Internet link on an overhead projector. Interestingly, some course instructors offered extra credit for participation while others did not. For courses providing extra credit, more than 90 per cent of the students responded. The response rate was only 10 per cent where this incentive did not exist. This drastic difference based on extra credit was found to hold irrespective of the class size.

In another study we utilized a participant pool from an Introductory Psychology course. The participants received research credit for their participation that counted towards a course requirement. As it is to be expected, recruitment turned out to be a virtual non-issue. We just posted the study on the sign-up page and then e-mailed the link to those who signed up. Sections 5 and 6 below, focusing on technological and data preparation issues discuss this project and other similar projects which use the electronic method to reduce data entry time and labor.

Recruitment methods such as community sampling, telephone surveys, and mail-in surveys, widely used in different fields of research (Dillman, 1978), have also proved their merit in our Internet surveys. An incentive to participate is not essential but definitely helps and that has been known for some time (Brennan, 1992). In our projects, offering guaranteed benefits yielded greater than 90 per cent response rates. Surveys offering the possibility of some benefit, but no guarantee, had much lower response rates but were better than those without the possibility of such benefit. Reminders have also been shown to improve response rates in manual surveys (Nederhof, 1988; Sheehan & McMillan, 1999). Reminders doubled responses among college freshmen in our health survey even though the resulting response rate was not sufficiently high. The driving behaviors project had no reminders or incentive for the first round of data collection and was a total failure. However, we overcame the lack of incentive and our inability to offer reminders by utilizing snowball sampling that originated with people motivated to help--our friends, family, and colleagues.

5. Technical Snags

Using the Internet to collect data is convenient and can greatly extend sample representativeness; however, the use of Internet is not without some risk. During the doctoral research of one of the authors, data were being collected using a mobile computer laboratory with an array of laptop computers, so as to avoid the time-consuming data entry process. Participants arrived every hour, completed the questionnaire online and left. Shortly into one of the sessions, the electricity supply to the building went out. Fortunately, the laptop batteries were fully charged and so no data were lost, and data collection continued. With desktop computers without uninterrupted power supply (UPS), the data entered till power-failure would have been lost and data collection would have to discontinue until power gets restored. Even with laptops this could have resulted in major inconvenience had the batteries not been charged or had the server been located in the building where the power supply was disrupted. After this experience the researcher printed out research packets to have on hand for future emergencies.

During the same project, the wireless Internet connection was lost for a period of time. This resulted in incomplete data from 18 respondents and created delays for the next session of data collection. A solution that was used in another Internet project conducted by the authors was to have a disc with the survey materials on it and have the respondents record their answers directly onto a Word document, which could later be transferred. To use this option, it is necessary to save each respondent's responses into a separate file for later retrieval, which requires enough disk space and the required level of access to save files.

In another research study conducted by one of the authors in a computer laboratory, all the computers contracted a virus. This was rather unfortunate, resulting in incomplete data from 14 respondents and lost data from 35 respondents. Considering that the sample size was 150, this resulted in approximately one-third of the sample being lost. Amendments for more participants had to be sent to the IRB since one of the experimental conditions was severely compromised by sheer luck of random assignment. Additionally, those 14 participants who were completing the study at the time had their university Internet accounts temporarily deactivated for using an infected computer. Prior to starting data collection each computer had been scanned for viruses and had antivirus updates installed. The virus came from another computer laboratory using the same server and infected the entire university network. Apart from keeping current on antivirus updates and timely virus scans, backing up the data more frequently during data collection could minimize virus-induced losses of already collected data. The paper-and-pencil back-ups will prevent losing participants who are present during the computer infection.


Another technology issue, especially in a laboratory setting, relates to the hardware devices used. In one of the studies mentioned above (i.e., the one with power-failure), the respondents were required to navigate the survey Web site using a touchpad. This resulted in delays and some confusion because the respondents were more used to a mouse, rather than a touchpad. Similarly, the type of screen and keyboard used can also make a difference. Specific screen sizes may be more appropriate for specific groups. Small screen size might be a disadvantage for groups with vision impairment. Similarly, perhaps a touch sensitive screen would be better than a keyboard while working with younger children.

In situations where multiple users may use the same computer to complete the study, it is necessary to determine if the survey software enters the data as new data or if, recognizing the same IP address, records over the previous data. This is not only a concern in laboratory settings; some hostel or dormitory rooms may have a single computer for multiple users. Even in the private home different family members may respond from the same computer. Another software issue is how it handles a respondent who exits the survey or closes the Web browser without completing the survey, whether accidentally or otherwise. Are they allowed to pick up at the point they exited, or do they need to start over? One study the authors were involved in did not allow the respondents to start where they left off. This resulted in numerous partially duplicated data points. For example, one would answer the first third of the survey and then accidentally exit, only to discover that one needed to start at the beginning to take the survey. This would result in the first third of the survey being duplicated, requiring increased time in data cleaning later. Perhaps, this also resulted in frustration and withdrawal from the study, indicated by the fact that after data cleaning to eliminate duplicate entries, approximately 7 per cent of the data sets were incomplete.

When using flyers to recruit respondents, the Web address of the survey can cause a practical difficulty. Since IRBs tend to require data encryption, this necessitates the use of secure Web sites. Secure Web sites are designated with "https" in their address (rather than the usual "http"). This can lead to the respondents not typing the address correctly and consequently being unable to locate the survey. In a laboratory setting, one of the authors discovered that about 13 per cent of the respondents typed the Web address incorrectly. Specifically, they were all making the same error mentioned above. Even when told to be sure to type "https" and emphasizing the letter 's' the error rate was approximately 4 per cent. This tendency may be even more pronounced when using paper flyers or windshield leaflets for recruitment and possibly contributed to the dismal 0.5 per cent response rate encountered in the driving behaviors study.

The authors, jointly or individually, have been involved in over ten Internet-based surveys. Not a single one of those surveys has avoided technical or recruitment problems. Keeping back-up plans ready seems to be the major lesson from these experiences.

6. Data Preparation Issues

The single most appealing advantage of the electronic method of data collection is the elimination of the tedious data entry process. With the electronic method the data are entered into a database at the same time as the respondent completes the survey. If a researcher plans on collecting large amounts of data or having a large sample size, electronic data collection can be invaluable. It is a solution in itself when facing mountains of data and weeks worth of data entry. An additional advantage is that typing errors by the researcher are avoided. The data file is an exact replica of the responses received. However, electronic data files can easily lead to other types of error.

Electronic data files almost always need to be transformed, merged, and/or reformatted before use. Most available electronic formats separate the survey into sections and the data are provided in separate files for each section. These must be merged together so that analyses can be performed. Additionally, some programs that help facilitate creating e-surveys use their own coding schemes, which are not what the researcher might use. For example, 1-7 Likert scales may be recorded as 0-6 scales by the computer. Also, many established subscales have specific scoring criteria. Because of this, simple transformations are usually performed on the data. Also, when the data are downloaded into a database program, some programs default everything to string format, even if the data were meant to be numeric. As a result, another reformatting of the data becomes necessary. None of these issues is hard to correct. However, the more steps we add to the process, the more likely are we to make a mistake.

7. Conclusion

Data collection over the Internet has many potential benefits. Unfortunately, it also has many potential problems. Properly used, Internet-based data collection can generate large samples, be a solution to funding problems, ease logistics, and eliminate data entry. However, problems can arise during any phase of the research. With careful planning, many issues can be avoided altogether. While not all inclusive, this paper presents many of the issues the authors have encountered while conducting Internet-based data collection.

Advantages of Internet-based research have allowed us to dream a little bigger and pursue projects and research questions we would never have considered. Who would want to collect data in six cities in three states without formal funding? The Internet and some "creative budgeting" allowed the two of us to put the finishing touch on a project that had been two years in the making but confined to the available student pool for data collection. However, we will not discard the paper-and-pencil format either. For some projects, the inclusion of electronic data collection is not only unnecessary but also impractical. It can add unnecessary costs, time commitments, and headaches when used for smaller samples that are easily available. Conducting Internet-based research remains a decision that the researcher must weigh carefully.

Source:  http://jrp.icaap.org/index.php/jrp/article/view/30/51

Categorized in Internet Ethics

The World Wide Web is an extraordinary resource for gaining access to information of all kinds, including historical, and each day a greater number of sources become available online. The advantages that the internet offers students are tremendous; so much so that some may be tempted to bypass the library entirely and conduct all of their research on the web. The History Department wants CU students to pursue knowledge with every tool available, including the internet, so long as they do so judiciously.

It is important to know that the Web is an unregulated resource. Because many unreliable sources exist on the internet, anyone – even people who have no expertise at all in your subject – can post anything at anytime. Many sources on the web have proven to be unreliable, biased, and inaccurate. Too much reliance on the web could do more damage than good. Checking the reliability and accuracy of information taken from random sites could take more time than going to the library. And using information you have not checked from such sources could have a detrimental impact on your final grade.

The key is to learn how to use the web to your best advantage.

  • To determine the best application of internet sources to your particular assignment it is strongly recommended that students talk with their instructors. Ask what internet sources will make your research and learning experience most productive.
  • Just as there are countless questionable and unreliable sources on the web, there are a growing number of newspapers, journals, archives, historical societies, libraries, colleges, and universities that are making their holdings available to all. One invaluable source is the Library of Congress (www.loc.gov), which has made millions of sources – written and visual -- accessible. Instructors and library research staff can help students locate many similar sites.
  • The internet should never be your only source when doing research. The best option for students is always the university’s libraries. Students should begin any research project by (1) familiarizing themselves with resources held in Norlin and other libraries around campus; and (2) accessing internet-based resources through the CU Library gateway.
  • A web-based tutorial, which will instruct library users on how to conduct web-based research, is available to everyone. It will show you: the difference between scholarly and popular sources, how to identify keywords, how to conduct searches on a library’s catalogue and through article databases, how to evaluate the integrity of sources, and how to use the information you find legally and ethically. The tutorial can be found at: http://ucblibraries.colorado.edu/pwr/public_tutorial/home.htm
  • History students can go to a page designed especially for them. This link will give you access to subject guides in history as well as introduce you to reliable internet and CU library resources: http://ucblibraries.colorado.edu/research/subjectguides/history.
  • The library maintains a page of electronic resourses, including searchable database, such as JSTOR and EEBO, so that students can take advantage of the considerable resources available to members of the university community

     Source:  http://www.colorado.edu/history/undergraduates/paper-guidelines/using-internet-research


Categorized in Online Research

For most people, Internet research involves little more than putting a search query into Google and hitting return. But for others, such a basic search just won't cut it. Perhaps you are writing a doctoral dissertation and need to carry out in-depth research on your topic, or maybe you're researching an article for a newspaper. Whatever your reasons, there are plenty of tools to help you get the most out of your online research. Have a question? Get an answer from online tech support now!


1. Get your browser ready for research. If you're going to be wading through hundreds of webpages, it's essential that you get organized at the beginning. Some useful Firefox extensions are designed specifically for researchers. Zotero is a free add-on that works something like an advanced bookmarking tool. You can organize links into files, annotate them and share them with other users. Diigo is another free add-on that allows you to annotate individual pages, handy for long documents when you can't remember exactly why you bookmarked a page.

2. Install reference management software on your computer. At the end of a lengthy assignment, you might have more than 100 references, so install a program like Endnote to take the hard work out of keeping your references in order.

Use advanced search engine queries. Google has a whole range of optional search parameters that you can use to refine your results, such as page language, file type and usage rights. Consider using an application like Fefoo which lets you quickly conduct searches across multiple search engines, and has an array of search operators to improve your results.

4. Visit sites that may not be indexed by the search engines, but which host highly specialized data such as the Library of Congress, BioMedCentral, Project Gutenberg and the U.S. Government Manual. See the Resources section for links to more sites like this.

5. Share and exchange your research results using online collaboration applications. Sites like Glasscubes and Colaab are packed with real-time features that make sharing large tracts of information fast and easy. 

Written by: John Phillips


Categorized in Online Research


Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media