Website Search
Research Papers
plg_search_attachments
Articles
FAQs
Easy Profile - Search plugin
Courses & Exams
Pages
Specialized Search Engines
Events Calender
Upcoming Events

Source: This article was Published hub.packtpub.com By Sugandha Lahoti - Contributed by Member: Carol R. Venuti

Google has launched Dataset Search, a search engine for finding datasets on the internet. This search engine will be a companion of sorts to Google Scholar, the company’s popular search engine for academic studies and reports. Google Dataset Search will allow users to search through datasets across thousands of repositories on the Web whether it be on a publisher’s site, a digital library, or an author’s personal web page.

Google’s Dataset Search scrapes government databases, public sources, digital libraries, and personal websites to track down the datasets. It also supports multiple languages and will add support for even more soon. The initial release of Dataset Search will cover the environmental and social sciences, government data, and datasets from news organizations like ProPublica. It may soon expand to include more sources.

Google has developed certain guidelines for dataset providers to describe their data in a way that Google can better understand the content of their pages. Anybody who publishes data structured using schema.org markup or similar equivalents described by the W3C, will be traversed by this search engine. Google also mentioned that Data Search will improve as long as data publishers are willing to provide good metadata. If publishers use the open standards to describe their data, more users will find the data that they are looking for.

Natasha Noy, a research scientist at Google AI who helped create Dataset Search, says that “the aim is to unify the tens of thousands of different repositories for datasets online. We want to make that data discoverable, but keep it where it is.”

Ed Kearns, Chief Data Officer at NOAA, is a strong supporter of this project and helped NOAA make many of their datasets searchable in this tool. “This type of search has long been the dream for many researchers in the open data and science communities,” he said.

Published in Search Engine

Source: This article was Published legalreader.com By - Contributed by Member: Barbara Larson

Can you imagine life without Google or spending more than a few seconds searching for any information? I bet you can’t because it’s a privilege that makes your life much easier and more comfortable. But there is a big problem with search engines – they damage privacy and it becomes an issue.

It’s almost impossible to protect personal data since everybody is collecting information these days. For instance, Facebook recently announced that it can track even non-users when they visit a site or app that uses their services.

In such circumstances, it is crucial to understand how search engines function and what they do with your personal data. This post will explain to you how things work in this field.

How Search Engines Collect Data

Search engines possess every user’s browsing history. It may not sound like much, but let’s see what it really means in case of the biggest player on the search engine market, Google.

This company collects all sorts of user-related data, but it can be divided into three basic sections:

  • Things you do. Google monitors every action you take online, including search queries, websites you visit, videos you watch, ads that you click on or tap, your location, device information, and IP address and cookie data.
  • Things that you create. This section consists of emails you send and receive on Gmail, contacts that you add, calendar events, and photos or videos that you upload. Besides that, it holds documents, sheets, and slides on Drive.
  • Things about you. These are essentially personal information such as your name, email address and password, date of birth, gender, telephone number, and location.

It’s a short list of data mining units, but it obviously consists of everything you’ve ever done online. Unless you’ve been living in a cave for the last couple of decades, Google knows a lot about you and uses this information to provide you with tailored online experience.

Why Search Engines Accumulate Personal Information

The more you know about users, the easier you can approach them. Search engines know this very well and so they collect personal information to enhance their services. First of all, they do it to improve website ranking.

According to SEO specialists at aussiewritings.com, Google analyzes user behavior and learns how people react to online content, which helps this company to upgrade search engine algorithms. As the result, only the best and most popular websites can make it to the first page in search results.

Secondly, Google can serve you personalized ads because it knows what you do, feels, and like. It can put things into perspective and display the right advertisement at just about the right time. That way, Google drastically improves the effectiveness of digital advertising.

How Does It Jeopardize Privacy?

With so much information hovering around the Internet, it is reasonable to assume that security breaches will happen from time to time. Identity theft is one of the biggest concerns because it’s getting easier to find someone’s personal information online and use it to steal their money.

Most websites ask you to leave your name, email, and birthday. Although it seems like nothing more than useless basic information, hackers can easily exploit it to access your bank account or any other digital property for that matter.

At the same time, continuous data accumulation also means humans are being treated primarily as consumers. You can’t hide from search engines – they will always find you and serve you customized ads.

If you are a 30-year-old mother, they will offer you baby clothing. If you are a high school boy, they will suggest you buy video games. In each case, there is no way to hide from search engines and that’s something that scares us all.

Final Thoughts

Search engines damage privacy and it becomes an issue because there is no way to protect yourself completely. Google and other platforms use personal information to improve user experience and customize advertising, but it comes with a cost.

Published in Search Engine

 Source: This article was Published securityintelligence.com By Jasmine Henry - Contributed by Member: Deborah Tannen

The dark component of the deep web is the primary highway for the exchange and commerce among cybercriminal groups. In fact, very few cybercriminals work alone. Eighty percent of cybercrime is linked to criminal collectives, and stolen data-shaped goods surface rapidly on darknet forums and marketplaces following cybersecurity incidents with data loss.

Adapting to these trends is essential. Organizations with the ability to extract threat intelligence from data-mining these elusive online sources can achieve a significant security advantage.

Deep Web and Darknet: What’s the Difference?

The part of the web accessible through search engines and used for everyday activities is known among researchers as the surface web. Anything beyond that is defined as the deep web. While estimates vary, some researchers project there is 90 percent more deep websites than surface ones, according to TechCabal. In the deep web are unindexed websites that are not accessible to everyday Internet users. Some restrict access, others are routed through many layers of anonymity to conceal their operators’ identity.

Darknet websites and technologies are a subset of the deep web classification, which consists of sites intentionally hidden and generally only accessible through technologies like The Onion Router (Tor), a software that facilitates anonymous communication, or peer-to-peer (P2P) browsers. This hidden web is closely associated with anonymity and (in some cases) criminal activity supported by open exchange and collaboration between threat actors.

How to Draw Dark Threat Intelligence

“Dark web intelligence is critical to security decision-making at any level,” said Dave McMillen, senior analyst with X-Force IRIS at IBM X-Force Incident Response and Intelligence Services (IRIS). “It is possible to collect exploits, vulnerabilities and other indicators of compromise, as well as insight into the techniques, tactics, and procedures [TTPs] that criminals use for distinct knowledge about the tools and malware threat actors favor.”

When this real-time threat data is filtered through sufficient context and separated from false positives, it becomes actionable intelligence. McMillen believes there are several ways organizations can benefit from dark-sourced intelligence. These benefits include understanding emerging threat trends to develop mitigation techniques proactively. Dark-source intelligence could also help with identifying criminal motivations and collusion before attacks. It could even aid in attributing risks and attacks to specific criminal groups.

How to Identify Darknet Security Risks

For expert threat researchers like McMillen, patterns of deep web activity can reveal an attack in progress, planned attacks, threat trends or other types of risks. Signs of a threat can emerge quickly, as financially-driven hackers try to turn stolen data into profit within hours or minutes of gaining entry to an organization’s network.

The average time it takes to identify a cybersecurity incident discovery is 197 days, according to the 2018 Cost of a Data Breach Study from the Ponemon Institute, sponsored by IBM. Companies who contain a breach within 30 days have an advantage over their less-responsive peers, saving an average of $1 million in containment costs.

“Employing dark web monitoring solutions that allow the use of focused filters to identify key phrases, such as your brand and product names, that may contain information that can negatively affect your organization is a good start in your effort to glean useful intelligence from the dark web,” McMillen said.

The collected data should then be alerted and routed through a human analysis process to provide actionable insights. Context-rich threat intelligence can reveal many different forms of risk.

1. Organization or Industry Discussion

Among the key risk factors and threats are mentions of an organization’s name in forum posts, paste sites, channels or chatrooms. Contextual analysis can determine whether threat actors are planning an attack or actively possess stolen data. Other high-risk discussions can surround niche industries or verticals, or information on compromising highly-specific technologies employed by an organization.

2. Personally Identifiable Information (PII) Exchange

When a breach has occurred, the sale of PII, personal health data, financial data or other sensitive information can be indicative of the aftermath of an attack. A single data record can sell for up to $20, according to Recorded Future. This data is generally stolen en-masse from large organizations — such as credit agencies and banks — so a few thousand credit card numbers can turn a huge profit.

Unsurprisingly, 76 percent of breaches are financially motivated, according to the 2018 Data Breach Investigations Report from Verizon.

3. Credential Exchange

Lost or stolen credentials were the most common threat action employed in 2017, contributing to 22 percent of data breaches, according to the Verizon report. While the presence of usernames and passwords on paste sites or marketplaces can indicate a data breach, contextual analysis is required to determine whether this is a recent compromise or recycled data from a prior incident.

In May 2018, threat intelligence company 4iQ uncovered a massive floating database of identity information, including over 1.4 billion unencrypted credentials.

“The breach is almost two times larger than the previous largest credential exposure,” Julio Casal, founder of 4iQ, told Information Age.

4. Information Recon

Social engineering tactics are employed in 52 percent of attacks, according to a February 2018 report from security company F-Secure. Collusion around information recon can surface in both open and closed-forum exchanges between individual threat actors and collectives.

5. Phishing Attack Coordination

As phishing and whaling attacks become more sophisticated, deep web threat intelligence can reveal popular TTPs and risks. Coordination around information recon is common. Threat actors can now purchase increasingly complex phishing-as-a-service software kits and if defenders are familiar with them, they can better educate users and put the right controls in place.

dir=”ltr”>Although malicious insiders cause fewer breaches than simple human error, the darknet is an established hub for criminal collectives to recruit employees with network credentials for a sophisticated attack. Dark Reading tracked nearly twice as many references to insider recruitment on darknet forums in 2016 as in 2015.

7. Trade Secrets and Sensitive Asset Discussions

Trade secrets and competitive intelligence are another lucrative aspect of threat actor commerce that can signal risks to researchers. In one recent incident reported by CNBC in July 2018, a likely Russian cybercriminal sold access to a law firm’s network and sensitive assets for $3,500. Having had that information ahead of time could have saved the victim time, money, and reputational damage.

What Are the Challenges to Deriving Value From Dark Sources?

While there is clear strategic and tactical value to darknet threat intelligence, significant challenges can arise on the road to deep web threat hunting and data-mining. For instance, it’s not ideal to equip security operations center (SOC) analysts with a Tor browser. The potential volume of false positives based on the sheer size of the hidden web necessitates a more effective approach.

“The dark web is fragmented and multi-layered,” McMillen said.

When researchers discover a credible source, it generally requires hours to vet intelligence and perform a complete analysis. Darknet commerce has also grown increasingly mercurial and decentralized as law enforcement tracks criminal TTPs as they emerge. Security leaders who can overcome these barriers have the potential to significantly improve security strategy in response to emerging threat trends and risk factors.

The 2018 Artificial Intelligence (AI) in Cyber-Security Study from the Ponemon Institute, sponsored by IBM Security, discovered that artificial intelligence (AI ) could provide deeper security and increased productivity at lower costs. Sixty-nine percent of respondents stated that the most significant benefit of AI was the ability to increase speed in analyzing threats.

As leaders consider how to deepen adoption of dark threat intelligence, it’s valuable to understand that not all intelligence sources can adequately capture the full scope of threat actor exchange on this vast, fast-morphing plane. Relying on stagnant, outdated or fully automated technologies may fail to mitigate important risks. The best mode of protection is one which combines the intelligence of skilled human researchers and AI to turn raw data into actionable intelligence effectively.

Published in Deep Web

As scientific datasets increase in both size and complexity, the ability to label, filter and search this deluge of information has become a laborious, time-consuming and sometimes impossible task, without the help of automated tools.

With this in mind, a team of researchers from Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley is developing innovative machine learning tools to pull contextual information from scientific datasets and automatically generate metadata tags for each file. Scientists can then search these files via a web-based search engine for scientific , called Science Search, that the Berkeley team is building.

As a proof-of-concept, the team is working with staff at the Department of Energy's (DOE) Molecular Foundry, located at Berkeley Lab, to demonstrate the concepts of Science Search on the images captured by the facility's instruments. A beta version of the platform has been made available to Foundry researchers.

"A tool like Science Search has the potential to revolutionize our research," says Colin Ophus, a Molecular Foundry research scientist within the National Center for Electron Microscopy (NCEM) and Science Search Collaborator. "We are a taxpayer-funded National User Facility, and we would like to make all of the data widely available, rather than the small number of images chosen for publication. However, today, most of the data that is collected here only really gets looked at by a handful of people—the data producers, including the PI (principal investigator), their postdocs or graduate students—because there is currently no easy way to sift through and share the data. By making this raw data easily searchable and shareable, via the Internet, Science Search could open this reservoir of 'dark data' to all scientists and maximize our facility's scientific impact."

The Challenges of Searching Science Data

Today, search engines are ubiquitously used to find information on the Internet but searching  data presents a different set of challenges. For example, Google's algorithm relies on more than 200 clues to achieve an effective search. These clues can come in the form of keywords on a webpage, metadata in images or audience feedback from billions of people when they click on the information they are looking for. In contrast, scientific data comes in many forms that are radically different than an average web page, requires context that is specific to the science and often also lacks the metadata to provide context that is required for effective searches.

At National User Facilities like the Molecular Foundry, researchers from all over the world apply for time and then travel to Berkeley to use extremely specialized instruments free of charge. Ophus notes that the current cameras on microscopes at the Foundry can collect up to a terabyte of data in under 10 minutes. Users then need to manually sift through this data to find quality images with "good resolution" and save that information on a secure shared file system, like Dropbox, or on an external hard drive that they eventually take home with them to analyze.

Oftentimes, the researchers that come to the Molecular Foundry only have a couple of days to collect their data. Because it is very tedious and time-consuming to manually add notes to terabytes of scientific data and there is no standard for doing it, most researchers just type shorthand descriptions in the filename. This might make sense to the person saving the file but often doesn't make much sense to anyone else.

"The lack of real metadata labels eventually causes problems when the scientist tries to find the data later or attempts to share it with others," says Lavanya Ramakrishnan, a staff scientist in Berkeley Lab's Computational Research Division (CRD) and co-principal investigator of the Science Search project. "But with machine-learning techniques, we can have computers help with what is laborious for the users, including adding tags to the data. Then we can use those tags to effectively search the data."

To address the metadata issue, the Berkeley Lab team uses machine-learning techniques to mine the "science ecosystem"—including instrument timestamps, facility user logs, scientific proposals, publications and file system structures—for contextual information. The collective information from these sources including the timestamp of the experiment, notes about the resolution and filter used and the user's request for time, all provide critical contextual information. The Berkeley lab team has put together an innovative software stack that uses machine-learning techniques including natural language processing pull contextual keywords about the scientific experiment and automatically create metadata tags for the data.

For the proof-of-concept, Ophus shared data from the Molecular Foundry's TEAM 1 electron microscope at NCEM that was recently collected by the facility staff, with the Science Search Team. He also volunteered to label a few thousand images to give the machine-learning tools some labels from which to start learning. While this is a good start, Science Search co-principal investigator Gunther Weber notes that most successful machine-learning applications typically require significantly more data and feedback to deliver better results. For example, in the case of search engines like Google, Weber notes that training datasets are created and machine-learning techniques are validated when billions of people around the world verify their identity by clicking on all the images with street signs or storefronts after typing in their passwords, or on Facebook when they're tagging their friends in an image.

Berkeley Lab researchers use machine learning to search science data
This screen capture of the Science Search interface shows how users can easily validate metadata tags that have been generated via machine learning or add information that hasn't already been captured. Credit: Gonzalo Rodrigo, Berkeley Lab

"In the case of science data only a handful of domain experts can create training sets and validate machine-learning techniques, so one of the big ongoing problems we face is an extremely small number of training sets," says Weber, who is also a staff scientist in Berkeley Lab's CRD.

To overcome this challenge, the Berkeley Lab researchers used to transfer learning to limit the degrees of freedom, or parameter counts, on their convolutional neural networks (CNNs). Transfer learning is a machine learning method in which a model developed for a task is reused as the starting point for a model on a second task, which allows the user to get more accurate results from a smaller training set. In the case of the TEAM I microscope, the data produced contains information about which operation mode the instrument was in at the time of collection. With that information, Weber was able to train the neural network on that classification so it could generate that mode of operation label automatically. He then froze that convolutional layer of the network, which meant he'd only have to retrain the densely connected layers. This approach effectively reduces the number of parameters on the CNN, allowing the team to get some meaningful results from their limited training data.

Machine Learning to Mine the Scientific Ecosystem

In addition to generating metadata tags through training datasets, the Berkeley Lab team also developed tools that use machine-learning techniques for mining the science ecosystem for data context. For example, the data ingest module can look at a multitude of information sources from the scientific ecosystem—including instrument timestamps, user logs, proposals, and publications—and identify commonalities. Tools developed at Berkeley Lab that uses natural language-processing methods can then identify and rank words that give context to the data and facilitate meaningful results for users later on. The user will see something similar to the results page of an Internet search, where content with the most text matching the user's search words will appear higher on the page. The system also learns from user queries and the search results they click on.

Because scientific instruments are generating an ever-growing body of data, all aspects of the Berkeley team's science search engine needed to be scalable to keep pace with the rate and scale of the data volumes being produced. The team achieved this by setting up their system in a Spin instance on the Cori supercomputer at the National Energy Research Scientific Computing Center (NERSC). Spin is a Docker-based edge-services technology developed at NERSC that can access the facility's high-performance computing systems and storage on the back end.

"One of the reasons it is possible for us to build a tool like Science Search is our access to resources at NERSC," says Gonzalo Rodrigo, a Berkeley Lab postdoctoral researcher who is working on the natural language processing and infrastructure challenges in Science Search. "We have to store, analyze and retrieve really large datasets, and it is useful to have access to a supercomputing facility to do the heavy lifting for these tasks. NERSC's Spin is a great platform to run our search engine that is a user-facing application that requires access to large datasets and analytical data that can only be stored on large supercomputing storage systems."

An Interface for Validating and Searching Data

When the Berkeley Lab team developed the interface for users to interact with their system, they knew that it would have to accomplish a couple of objectives, including effective search and allowing human input to the machine learning models. Because the system relies on domain experts to help generate the training data and validate the machine-learning model output, the interface needed to facilitate that.

"The tagging interface that we developed displays the original data and metadata available, as well as any machine-generated tags we have so far. Expert users then can browse the data and create new tags and review any machine-generated tags for accuracy," says Matt Henderson, who is a Computer Systems Engineer in CRD and leads the user interface development effort.

To facilitate an effective search for users based on available information, the team's search interface provides a query mechanism for available files, proposals and papers that the Berkeley-developed machine-learning tools have parsed and extracted tags from. Each listed search result item represents a summary of that data, with a more detailed secondary view available, including information on tags that matched this item. The team is currently exploring how to best incorporate user feedback to improve the models and tags.

"Having the ability to explore datasets is important for scientific breakthroughs, and this is the first time that anything like Science Search has been attempted," says Ramakrishnan. "Our ultimate vision is to build the foundation that will eventually support a 'Google' for scientific data, where researchers can even  distributed datasets. Our current work provides the foundation needed to get to that ambitious vision."

"Berkeley Lab is really an ideal place to build a tool like Science Search because we have a number of user facilities, like the Molecular Foundry, that has decades worth of data that would provide even more value to the scientific community if the data could be searched and shared," adds Katie Antypas, who is the principal investigator of Science Search and head of NERSC's Data Department. "Plus we have great access to machine-learning expertise in the Berkeley Lab Computing Sciences Area as well as HPC resources at NERSC in order to build these capabilities."

Source: This article was published phys.org

Published in Online Research

One tool shows how a site stacks up against the competition on mobile. The other aims to drive home the impact mobile speed can have on the bottom line.

Google has focused on getting marketers and site owners to improve mobile site experiences for many years now. On Monday at Mobile World Congress in Barcelona, the search giant announced the release of two new mobile benchmarking resources to help in this effort: a new Mobile Scorecard and a conversion Impact Calculator.

Both tools aim to give marketers clear visuals to help them get buy-in from stakeholders for investments in mobile site speed.

The Mobile Scorecard taps Chrome User Experience Report data to compare the speed of multiple sites on mobile. That’s the same database of latency data from Chrome users that Google started using in its PageSpeed Insights Tool in January. Google says the Mobile Scorecard can report on thousands of sites from 12 countries.

As a guideline, Google recommends that a site loads and becomes usable within five seconds on mid-range mobile devices with 3G connections and within three seconds on 4G connections.

To put the Mobile Scorecard data into monetary perspective for stakeholders, the new Impact Calculator is designed to show just how much conversion revenue a site is missing out on because of its slow loading speed.

The conversion Impact Calculator is based on data from The State of Online Retail Performance report from April 2017 that showed each second of delay in page load on a retail site can hurt conversions by up to 20 percent.

The calculator shows how a change in page load can drive revenue up or down after marketers put in their average monthly visitors, average order value and conversion rate. Google created a similar tool for publishers called DoubleClick Publisher Revenue Calculator.

Source: This article was published searchengineland.com By Ginny Marvin

Published in Search Engine

Google has announced that a beta version of the new Search Console, released a few months ago to select users, will now be available to everyone.

The new Search Console will be rolled out gradually, and webmasters will be individually notified when they receive access.

Still, in beta, the new Search Console will live side by side with the old version. Users can toggle between them in the navigation menu.

As it was the most consistently requested new feature, site owners should be happy to know the public beta has the same 16 months of data that was available in the private beta.

In addition to more data within the Search Performance report (previously Search Analytics report), the new Search Console has been completely rebuilt. It has been designed with a renewed focus on helping site owners identify and fix and pending issues.

With the updated Index Coverage, AMP status, and Job postings reports, site owners will be guided through a simplified process of optimizing their website’s presence in search results.

Index Coverage Report

Google has added “issue tracking functionality” to the Index Coverage report, which alerts site owners when new issues are detected. Search Console will then provide information on fixing a specific issue, as well as verify when it has been fixed correctly.

The State of Local Search 2018: Expert Webinar
Join a panel of the biggest local search experts as we explore how the industry changed in 2017 and predict what search engines might have in store.

Recognizing that fixing webpage issues can often involve a team of individuals, Google has added share buttons within the Index Coverage report. Now a direct link to a specific issue can be shared with whomever it concerns.

AMP and Job Postings

Issues can also arise when creating AMP versions of web pages, or implementing Job Postings markup. The new search console will identify issues related to these two types of “search enhancements,” with more to be added in the future.

In addition to providing information about how to fix an issue, the AMP and Job Postings reports have two unique features. When validating a fix, Search Console will run several instantaneous reports to provide site owners with more immediate feedback.

If you’re testing multiple URLs, then at the end of the process Search Console will provide a validation log. This document will detail which URLs have been identified as fixed, as well as the ones that failed.

As Google works to improve on the beta release of the new Search Console it will be continuously listening to user feedback. The new version does not have all the functionality of the classic version, which is why the two will live side-by-side until the beta is complete.

Source: This article was published searchenginejournal.com By Matt Southern

Published in Search Engine

Google’s omnipresence in the lives of people has led to a wealth of information on what people think, want and desire. Such data is being used by researchers to understand human behaviours and psyche.

This email address is being protected from spambots. You need JavaScript enabled to view it. is a research assistant at the department of psychology, Monk Prayogshala, Mumbai.

The Internet was started as a haven a place where people could connect across nations, where they could get answers at the click of a button. And in 2017, the Internet has become life itself and Google the preferred navigator for most people in the world. A little more than a decade ago, Google was created to organise the world’s data and its success is well documented, not in the least by the fact that “google” has now become a common verb in many languages. Looking for a place to eat nearby? Google it. What’s the weather going to be like today? Google it. Show timings for a new movie? Google it. Is that weird mole on your back cancerous? Oh my god, google it!

The ubiquitous use of Google to find answers to questions big and small has provided unique and unprecedented knowledge about human behaviour patterns and psyche. Like a trail of breadcrumbs, the trails of internet searches we leave behind reveal our deepest fears, desires and secrets, and researchers are beginning to follow them.

Most people in the developed world, and an increasing number of people in the developing world, turn to Google for information on consumer products like cars and mobile phones, health, politics, entertainment and even love. This creates a wealth of information about what people want. The availability of such large datasets is almost unheard of in research circles, and Google provides an array of tools for researchers to analyse and make sense of thousands and thousands of search queries.

The most widely used application for Google searches is in market research. The search tool Google Trends is often used to understand brand health and monitor changes in consumer interest across metrics such as seasonality and competition. Derived from search queries, Trends is a numeric/historic representation of the relative volume of searches made on Google. This data can be mined for actionable insights in a way that is not possible with consumer surveys.

The data allows you to plan and prioritise awareness-based media campaigns for your product, understand the global reach and interest, and provides consumer interest data going all the way back to 2004. For example, when comparing Patanjali’s and Maggi’s noodles, it is clear that consumer internet search interest in the former is lacking and is largely restricted to India, as opposed to the latter. So Patanjali might want to think about its global outreach. While flexibility and state-wise local-consumer insight is limited in India, it is easy to see why the tool is a goldmine for market research as such.

Red line indicates Google searches for Maggi over time. The blue line indicates Google searches for Patanjali noodles over time. Source: Author provided

Red line indicates Google searches for Maggi over time. The blue line indicates Google searches for Patanjali noodles over time. Source: Author provided

Comparison of Google searches for Maggi and Patanjali noodles across regions. Source: Author provided

Red line indicates Google searches for Maggi over time. The blue line indicates Google searches for Patanjali noodles over time. Source: Author provided

While the use of Google search queries for market research purposes is well explored, a new field in which it is gaining popularity is the social sciences. Big Data application for social sciences research is an emerging trend and the potential of Google data to help understand the human psyche has barely been scratched. Most research done in the social sciences relies on survey data or self-reported behaviour, both prone to social desirability biases. In order to look good or answer in a socially acceptable way, people exaggerate, leave out aspects of or just lie about their behaviours. For sensitive topics such as racial animus or sexual orientation, there is a substantial amount of misreporting even when the surveys are conducted online and anonymously.

The power of Google data is that people ask this white box things that they would not reveal to anyone or anywhere, indicating genuine interest. Anonymous Google search queries provide a rare glimpse into the behaviours, motivations, fears and desires of people – honest and unfiltered. This has led data scientist Seth Stephens-Davidowitz to label internet search data as the “digital truth serum.” Using Google tools like Trends, AdWords and Correlates, Stephens-Davidowitz revealed in his book darker truths about human behaviours. For instance: America has a higher number of closeted gay than traditional survey data would find – found by looking at same-sex pornography searches by men (nearly 5%), and predictive searches wherein the word ‘gay’ is 10% more likely to complete searches that begin with “is my husband…” than the second-place word “cheating.”

For another: In India, search data revealed that a high number of porn-related searches was on how to breastfeed husbands – a behaviour not revealed in any survey on sexual health. Mining through the data also reveals widespread racial animus against African-Americans in the US and increased Islamophobia following terrorist events, especially after pleas of tolerance.

The potential of Google data is also being understood in the medical sciences. New fields such as information epidemiology, or infodemiology, are proposed to understand the determinants and distribution of health information, which is said to be helpful for health professionals and patients seeking higher quality healthcare on the Internet. Using Trends, researchers have detected seasonal influenza outbreaks in regions in the US with a lag of only a day. Similarly, in a study led by Google investigators, anonymised and aggregated search volumes for terms related to “dengue” were found to fit well with the actual number of cases of dengue reported in Bolivia, Brazil, India, Indonesia and Singapore.

The availability of such real-time search queries means effective and immediate delivery of health services and information to places and individuals who require it. Seasonal trends in mental illnesses have also been revealed using Google search data. Search terms implied that people are 24% less likely to consider suicide in the summer, and queries about mental health dropped by 14% in the US and Australia from winter to summer. Such seasonal fluctuations are useful in studying and understanding the epidemiology of illnesses that are otherwise difficult to track.

Aside from the availability, the easy navigation and visualisation of large sets of data is also what attracts researchers to Google data. The company’s Public Data Explorer makes available large, public-interest datasets from varied international organisations such as the World Bank, OECD and governments such as the US, the UK, Iceland, etc. In order to prioritise which datasets to include and which to exclude, anonymous search logs were analysed to find patterns in the kinds of searches people were doing. The tool allows even novices to navigate complex datasets at the click of a button, compare data across countries or variables, and use animated charts and graphs, allowing users to see trends over time.

Analysing the data from the UN Human Development Programme Report (2015), the number of teenage mothers (aged 15-19 years) has reduced dramatically in India since 1990 and – after a peak in 1995 – have been reducing in the US as well. The availability and easy visualisation of this kind of information enables policy makers and interventionalists to design effective programs to tackle public health issues.

Births to teenage mothers (aged 15-19 years) in India and the US, 1980-2014. Source: Author provided

Births to teenage mothers (aged 15-19 years) in India and the US, 1980-2014. Source: Author provided

While Google data and tools provide an exciting new avenue for social sciences research, there are still strong limitations to using search query data. Indeed, one of the biggest disadvantages is the limited generalisability of the results thus derived. The samples are not randomly selected as they are in traditional research, making the results relevant only to Google-using netizens. Similarly, datasets and complex analytical tools are yet available mostly only for Americana and European samples. Additionally, the interpretations of results from these analyses are critically dependent on whether or not the search term parameters used are appropriate for the posed research questions. For example, one would not be able to definitively understand the extent to which sexism played a role in Hillary Clinton’s 2016 defeat because derogatory terms used to describe women are also key search terms for pornography.

The extent to which secondary data such as search queries can supplement – or even replace – burdensome, traditional data collection methods is yet unknown. The potential for mining localised data of developing countries is limited, the research methodology and the ethical and policy implications are still being debated. However initial indications suggest that search data may revolutionise research and our understanding of human beings.

Source: This article was published thewire.in By ANEREE PAREKH

Published in Search Engine

In this introduction to the basic steps of market research, the reader can find help with framing the research question, figuring out which approach to data collection to use, how best to analyze the data, and how to structure the market research findings and share them with clients.

The market research process consists of six discrete stages or steps. They are as follows:

The third step of market research - - Collect the Data or Information - entails several important decisions. One of the first things to consider at this stage is how the research participants are going to be contacted. There was a time when survey questionnaires were sent to prospective respondent via the postal system. As you might imagine, the response rate was quite low for mailed surveys, and the initiative was costly.

Telephone surveys were also once very common, but people today let their answering machines take calls or they have caller ID, which enables them to ignore calls they don't want to receive. Surprisingly, the Pew Foundation conducts an amazingly large number of surveys, many of which are part of longitudinal or long-term research studies.

Large-scale telephone studies are commonly conducted by the Pew researchers and the caliber of their research is top-notch.

Some companies have issued pre-paid phone cards to consumers who are asked to take a quick survey before they use the free time on the calling card. If they participate in the brief survey, the number of free minutes on their calling card is increased.

Some of the companies that have used this method of telephone surveying include Coca-Cola, NBC, and Amaco.

Methods of Interviewing

In-depth interviews are one of the most flexible ways to gather data from research participants. Another advantage of interviewing research participants in person is that their non-verbal language can be observed, as well as other attributes about them that might contribute to a consumer profile. Interviews can take two basic forms: Arranged interviews and intercept interviews.

Arranged interviews are time-consuming, require logistical considerations of planning and scheduling, and tend to be quite expensive to conduct. Exacting sampling procedures can be used in arranged interviews that can contribute to the usefulness of the interview data set. In addition, the face-to-face aspect of in-depth interviewing can result in exposure to interviewer bias, so training of interviewers necessarily becomes a component of an in-depth interviewing project.

Intercept interviews take place in shopping malls, on street corners, and even at the threshold of people's homes. With intercept interviews, the sampling is non-probabilistic. For obvious reasons, intercept interviews must be brief, to the point, and not ask questions that are off-putting.

Otherwise, the interviewer risks seeing the interviewee walk away. One version of an intercept interview occurs when people respond to a survey that is related to a purchase that they just made. Instructions for participating in the survey are printed on their store receipt and, generally, the reward for participating is a free item or a chance to be entered in a sweepstakes.

Online data collection is rapidly replacing other methods of accessing consumer information. Brief surveys and polls are everywhere on the Web. Forums and chat rooms may be sponsored by companies that wish to learn more from consumers who volunteer their participation. Cookies and clickstream data send information about consumer choices right to the computers of market researchers. Focus groups can be held online and in anonymous blackboard settings.

Market research has become embedded in advertising on digital platforms.

There are still many people who do not regularly have access to the Internet. Providing internet access for people who do not have connections at home or are intimidated by computing or networking can be fruitful. Often, the novelty of encountering an online market research survey or poll that looks like and acts like a game is incentive enough to convert reticent Internet users.

Characteristics of Data Collection

Data collection strategies are closely tied to the type of research that is being conducted as the traditions are quite strong and have resilient philosophical foundations. In the rapidly changing field of market research, these traditions are being eroded as technology makes new methods available. The shift to more electronic means of surveying consumers is beneficial in a number of ways. Once the infrastructure is in place, digital data collection is rapid, relatively error-free, and often fun for consumers. Where data collection is still centralized, market researchers can eliminate the headache of coding data by inputting responses into computers or touch screens. The coding is instantaneous and the data analysis is rapid.

Regardless of how data is collected, the human element is always important. It may be that the expert knowledge of market researchers shifts to different places in the market research stream. For example, the expert knowledge of a market researcher is critically important in the sophisticated realm of Bayesian Networks simulation and structured equation modeling -- two techniques that are conducted through computer modeling. Intelligently designed market research requires planning regardless of the platform. The old adage still holds true: Garbage in, garbage out.

Now you are ready to take a look at the market research process Step 4. Analyze the Data.

Sources

Kotler, P. (2003). Marketing Management (11th ed.). Upper Saddle River, NJ: Pearson Education, Inc., Prentice Hall.

Lehmann, D. R. Gupta, S., and Seckel, J. (1997). Market Research. Reading, MA: Addison-Wesley

Published in Market Research

The CIA is developing AI to advance data collection and analysis capabilities. These technologies are, and will continue to be, used for social media data.

INFORMATION IS KEY

The United States Central Intelligence Agency (CIA) requires large quantities of data, collected from a variety of sources, in order to complete investigations. Since its creation in 1947, intel has typically been gathered by hand. The advent of computers has improved the process, but even more modern methods can still be painstakingly slow. Ultimately, these methods only retrieve minuscule amounts of data when compared what artificial intelligence (AI) can gather.

According to information revealed by Dawn Meyerriecks, the deputy director for technology development with the CIA, the agency currently has 137 different AI projects underway. A large portion of these ventures are collaborative efforts between researchers at the agency and developers in Silicon Valley. But emerging and developing capabilities in AI aren’t just allowing the CIA more access to data and a greater ability to sift through it. These AI programs have taken to social media, combing through countless public records (i.e. what you post online). In fact, a massive percentage of the data collected and used by the agency comes from social media. 

As you might know or have guessed, the CIA is no stranger to collecting data from social media, but with AI things are a little bit different, “What is new is the volume and velocity of collecting social media data,” said Joseph Gartin, head of the CIA’s Kent School. And, according to Chris Hurst, the chief operating officer of Stabilitas, at the Intelligence Summit, “Human behavior is data and AI is a data model.”

AUTOMATION

According to Robert Cardillo, director of the National Geospatial-Intelligence Agency, in a June speech, “If we were to attempt to manually exploit the commercial satellite imagery we expect to have over the next 20 years, we would need eight million imagery analysts.” He went on to state that the agency aims to use AI to automate about 75% of the current workload for analysts. And, if they use self-improving AIs as they hope to, this process will only become more efficient.

While countries like Russia are still far behind the U.S. in terms of AI development, especially as it pertains to intelligence, there seems to be a global push — if not a race — forward.  Knowledge is power, and creating technology capable of extracting, sorting, and analyzing data faster than any human or other AI system could is certainly sounds like a fast track to the top.  As Vladimir Putin recently stated on the subject of AI, “Whoever becomes the leader in this sphere will become the ruler of the world.”

Source: This article was published futurism.com By Chelsea Gohd

Published in Science & Tech
HIGHLIGHTS
  • Fireball steals sensitive user data and manipulates regular surfing data
  • CERT-In has issued its latest advisory to Internet users
  • It said the virus can be detected by majority of anti-virus solution

Cyber-security sleuths have alerted Internet users against the destructive activity of a browser-attacking virus- 'Fireball'- that steals sensitive user data and manipulates regular surfing activity.

The malware has been spreading across the globe and possesses over two dozen aliases and spreads by bundling and "without the user's consent".

"It has been reported that a malware named as 'Fireball' targeting browsers is spreading worldwide.

"It has the ability to collect user information, manipulate web-traffic to generate ad-revenue, malware dropping and executing malicious code on the infected machines," the Computer Emergency Response Team of India (CERT-In) said in its latest advisory to Internet users.

The CERT-In is the nodal agency to combat hacking, phishing and to fortify security-related defences of the Indian Internet domain.

The agency said the malware or the virus can be "detected by majority of the anti-virus solutions" and it has advised Internet users to install updated anti-virus solutions to protect their computers from this infection.

It said the virus, 'Fireball', "currently installs plug-ins and additional configurations to boost its advertisements but it could be used as distributor for any additional malware in future."

"It is reported that the malware 'Fireball' is used by one of the largest marketing agency to manipulate the victims' browsers and changes their default search engines and home pages into fake search engines.

"It also re-directs the queries to either yahoo.com or Google.com. The fake search engines also collects the users' private information," the advisory said.

'Fireball', it said, is capable of acting as a browser-hijacker, manipulating web traffic to generate ad-revenue, capable of downloading further malware, capable of executing any malicious code on the victim machine and collects user information and steals credentials from the victim machine.

The CERT-In has also suggested some counter-measures: "Do not click on banners or pop-up or ads notifications, do not visit untrusted websites and do not download or open attachment in emails received from untrusted sources or unexpectedly received from trusted users."

It said a user, in order to exercise caution after logging-in the system, should check for default setting of web browsers, such as homepage, search engine, browser extensions and plug-ins installed, and if something is found unknown, then it should be deleted.

Source: This article was published gadgets.ndtv

Published in Internet Privacy
Page 1 of 4

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.
Please wait
online research banner

airs logo

AIRS is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Subscribe to AIRS Newsletter

Receive Great tips via email, enter your email to Subscribe.
Please wait

Follow Us on Social Media

Follow us on Facebook