One of the most ambitious endeavors in quantum physics right now is to build a large-scale quantum network that could one day span the entire globe. In a new study, physicists have shown that describing quantum networks in a new way—as mathematical graphs—can help increase the distance that quantum information can be transmitted. Compared to classical networks, quantum networks have potential advantages such as better security and being faster under certain circumstances. 

"A worldwide quantum network may appear quite similar to the internet—a huge number of devices connected in a way that allows the exchange of information between any of them," coauthor Michael Epping, a physicist at the University of Waterloo in Canada, told Phys.org. "But the crucial difference is that the laws of quantum theory will be dominant for the description of that information.

For example, the state of the fundamental information carrier can be a superposition of the basis states 0 and 1. By now, several advantages in comparison to classical information are known, such as prime number factorization and secret communication. However, the biggest benefit of quantum networks might well be discovered by future research in the rapidly developing field of quantum information theory."

Quantum networks involve sending entangled particles across long distances, which is challenging because particle loss and decoherence tend to scale exponentially with the distance.

In their study published in the New Journal of Physics, Epping and coauthors Hermann Kampermann and Dagmar Bruß at the Heinrich Heine University of Düsseldorf in Germany have shown that describing physical quantum networks as abstract mathematical graphs offers a way to optimize the architecture of quantum networks and achieve entanglement across the longest possible distances.

"A network is a physical system," Epping explained. "Examples of a network are the internet and labs at different buildings connected by optical fibers. These networks may be described by mathematical graphs at an abstract level, where the network structure—which consists of nodes that exchange quantum information via links—is represented graphically by vertices connected by edges. An important task for quantum networks is to distribute entangled states amongst the nodes, which are used as a resource for various information protocols afterwards. In our approach, the graph description of the network, which might come to your mind quite naturally, is related to the distributed quantum state."

In the language of graphs, this distributed quantum state becomes a quantum graph state. The main advantage of the graph state description is that it allows researchers to compare different quantum networks that produce the same quantum state, and to see which network is better at distributing entanglement across large distances.

Quantum networks differ mainly in how they use quantum repeaters—devices that offer a way to distribute entanglement across large distances by subdividing the long-distance transmission channels into shorter channels.

Here, the researchers produced an entangled graph state for a quantum network by initially defining vertices with both nodes and quantum repeaters. Then they described how measurements at the repeater stations modify this graph state. Due to these modifications, the vertices associated with quantum repeaters are removed so that only the network nodes serve as vertices in the final quantum state, while the connecting quantum repeater lines become edges.

In the final graph state, the weights of the edges correspond to the number of quantum repeaters and how far apart they are. Consequently, by changing the weights of the edges, the new approach can optimize a given performance metric, such as security or speed. In other words, the method can determine the best way to use quantum repeaters to achieve long-distance entanglement for large-scale quantum networks.

In the future, the researchers plan to investigate the demands for practical implementation. They also want to extend these results to a newer research field called "quantum network coding" by generalizing the quantum repeater concept to quantum routers, which can make quantum networks more secure against macroscopic errors. 

Source:  http://phys.org/news/2016-06-worldwide-quantum-web-graphs.html


Categorized in Online Research

Ladies and gentlemen, it’s time to go negative. (And, no, I’m not referring to this year’s presidential race). I’m talking about negative keywords: those words and phrases that are essential to ensuring your pay-per-click (PPC) ads are displayed to the right audience. 

Going negative: How to eliminate junk PPC queries

Here’s what I mean: You run a small business selling hand-blown glassware; you’ve just launched a new line of wine glasses. You bid on “glasses” as a search term.

Your searcher then Googles a keyword phrase that includes “glasses.” Your ad pops up; the searcher clicks on the ad. Great news, right? Think again: If that searcher is looking for the nearest “glasses repair shop,” for eye glasses, not wine glasses, you’ve just paid money for someone to accidentally click on your ad who has no intention of ever being a customer.

While Google is pretty smart when responding to search queries and integrating user intent into the results, its system isn't perfect. PPC success is predicated on the “Golden Rule of Paid Search”: Give users what they are looking for. As the SEO team at Ranked One has succinctly pointed out, “Paid search is a pull and not a push marketing initiative. Thus, it is vital that we only present searchers with that which is most relevant to their query.”

Here’s another example from the Ranked One team. Say you want to target searchers looking for “pet-friendly hotels in Albuquerque.” Following standard PPC best practices, you create a PPC ad that includes the search phrase in question (“pet-friendly hotels in Albuquerque”) and a landing page that echoes this message.

But that’s not enough. You also need to eliminate so-called “junk queries.” In this example, you would then remove queries from searchers who have no intention of booking a pet-friendly hotel room -- someone searching for hotel jobs in Albuquerque, for example.

True, a few erroneous clicks won’t sink your PPC budget. But, over time, the lack of a strong negative keyword list means your ads will be shown to the wrong target audience.

How to use negative keywords

Campaign level v. ad group level. There are two ways you can address negative keywords: Add them at the campaign level or the ad-group level. When you add a negative keyword at the campaign level, this tells Google to never show your ad for this keyword. Use this approach for keywords that will never be associated with your product, like “hotel jobs” for your pet-friendly hotel or “eyeglass repair” for your wine glasses. 

When you add negative keywords at the ad group level, you tell Google not to show ads at this particular ad-group level. Ad-group level negative keywords can be used to gain greater control over your AdWords campaigns.

Traditional vs. protective use.

All of our examples thus far have featured the traditional use of negative keywords -- eliminating extraneous queries that are irrelevant to your product or service. Protective use is a bit different. In a nutshell, you’re restricting the use of a highly specific keyword phrase from general ads, even if this phrase is relevant.


Kissmetrics offers a great example for PPC shoe ads. In its example, you sell red Puma suede sneakers and create a PPC ad with copy targeted at this particular type of shoe (Ad #1). You also have another, broader catch-all ad for general shoe sales (Ad #2).

In this example, you want to be sure that only people searching for “red Puma suede sneakers” see Ad #1. You don’t want any broad matches for “Puma” or “red sneakers” or “suede sneakers.” So, you add those phrases to your negative keyword list for Ad #1. This ad will then be displayed only to searchers with an exact match for “Red Puma suede sneakers,” effectively beating out all the broad match advertisers.

Building your negative list. When you’re selling a product or service, it’s easy to get stuck in the mindset of what you’re offering. You may be surprised by how ambiguous some of your search terms can be! Not sure how to get started building your negative list? Check out this handy keyword list from Tech Wise that includes a broad range of the most common negative keywords for eliminating erroneous queries, ranging from employment to research.

Next, dive into your queries. Ranked One recommends crawling through search query reports by pulling the SQR right in the Google interface. What phrases pop up again and again that are irrelevant to your product or service? What queries don’t match the user intent you’re targeting? Start your research there.

 Bottom line. Bidding on the best keywords is only half the battle. Negative keywords are just as important for an effective PPC strategy. When used correctly, negative keywords can help you save the budget for the best quality searches.

Source:  https://www.entrepreneur.com/article/276961 

Categorized in Online Research

When I think about the behavior of many business people today, I imagine a breadline. These employees are the data-poor, waiting around at the end of the day on the data breadline. The overtaxed data analyst team prioritizes work for the company executives, and everyone else must be served later. An employee might have a hundred different questions about his job. How satisfied are my customers? How efficient is our sales process? How is my marketing campaign faring?

These data breadlines cause three problems present in most teams and businesses today. First, employees must wait quite a while to receive the data they need to decide how to move forward, slowing the progress of the company. Second, these protracted wait times abrade the patience of teams and encourage teams to decide without data. Third, data breadlines inhibit the data team from achieving its full potential.

Once an employee has been patient enough to reach the front of the data breadline, he gets to ask the data analyst team to help him answer his question. Companies maintain thousands of databases, each with hundreds of tables and billions of individual data points. In addition to producing data, the already overloaded data teams must translate the panoply of figures into something more digestible for the rest of the company, because with data, nuances matter.

The conversation bears more than a passing resemblance to one between a third-grade student and a librarian. Even expert data analysts lose their bearings sometimes, which results in slow response times and inaccurate responses to queries. Both serve to erode the company’s confidence in their data.

Overly delayed by the strapped data team and unable to access the data they need from the data supply chain, enterprising individual teams create their own rogue databases. These shadow data analysts pull data from all over the company and surreptitiously stuff it into database servers under their desks. The problem with the segmented data assembly line is that errors can be introduced at any single step.

A file could be truncated when the operations team passes the data to the analyst team. The data analyst team might use an old definition of customer lifetime value. And an overly ambitious product manager might alter the data just slightly to make it look a bit more positive than it actually is. With this kind of siloed pipeline, there is no way to track how errors happen, when they happen or who committed them. In fact, the error may never be noticed. 

Data fragmentation has another insidious consequence. It incites data brawls, where people shout, yell and labor over figures that just don’t seem to align and that point to diametrically different conclusions.

Imagine two well-meaning teams, a sales team and a marketing team, both planning next year’s budget. They share an objective: to exceed the company’s bookings plan. Each team independently develops a plan, using metrics like customer lifetime value, cost of customer acquisition, payback period, sales cycle length and average contract value.

When there’s no consistency in the data among teams, no one can trust each other’s point of view. So meetings like this devolve into brawls, with people arguing about data accuracy, the definition of shared metrics and the underlying sources of their two conflicting conclusions.

Imagine a world where data is put into the hands of the people who need it, when they need it, not just for Uber drivers, but for every team in every company. This is data democratization, the beautiful vision of supplying employees with self-service access to the insights they need to maximize their effectiveness. This is the world of the most innovative companies today: technology companies like Uber, Google, Facebook and many others who have re-architected their data supply chains to empower their people to move quickly and intelligently. 

Source:  http://techcrunch.com/2016/06/12/data-breadlines-and-data-brawls/

Categorized in Online Research

Security researchers have found that some of the wealthiest and most developed nations are at the greatest risk of hacks and cyberattacks -- in part because they have more unsecured systems connected to the internet.

Security firm Rapid7 said in its latest research, published Monday, that many Western nations are putting competitiveness and business ahead of security, and that will have "dire consequences" for some of the world's largest economies, the report said.

The researchers pointed to a correlation between a nation's gross domestic product (GDP) and its internet "presence," with the exposure of insecure, plaintext services, which almost anyone can easily intercept.

Some of the most exposed countries on the internet today include Australia (ranked fourth), China (ranked fifth), France (13th), the US (14th), Russia (19th) and the UK (23rd).

Belgium led the rankings as the most exposed country on the internet, with almost one-third of all systems and devices exposed to the internet.

"Every service we searched for, it came back in the millions," said Tod Beardsley senior security research manager at Rapid7, who co-authored the report and spoke on the phone last week.

"Everything came back from two million to 20 million systems," he said.


As for the biggest culprits, there were over 11 million systems with direct access to relational databases, about 4.7 million networked systems that were categorized as the most commonly attacked port, and 4.5 million apparent printer services

But there was one that floated above them all -- a networking relic from the Cold War era.

Dissecting the example, Beardsley said the ongoing widespread use of a decades-old, outdated and unsecured networking protocol would prove his point. He said, citing the research, that scans showed that there are over 14 million devices still using outdated, insecure, plaintext Telnet for remotely accessing files and servers.

Beardsley said it was "encouraging" to see Secure Shell (SSH), its modern replacement, prevail over Telnet -- not least because given the choice, it's far easier to use -- which makes the switch much easier.

But he said it was frustrating to see millions nevertheless leave their systems wide open to hackers and nation-state attackers.

He echoed similar sentiments from the report, saying that the high exposure rates are a "failure" of modern internet engineering.

"Despite calls from... virtually every security company and security advocacy organization on Earth, compulsory encryption is not a default, standard feature in internet protocol design. Cleartext protocols 'just work,' and security concerns are doggedly secondary," said the paper.

Beardsley said that the research is a good starting point to see if there are other factors that determine if GDP influences the exposure rate, but they stressed that more work needed to be done and the research was just a foundation stone for further work.

"There are a million questions I have -- I could talk for an hour," he said.

Source:  http://www.zdnet.com/article/researchers-say-theyve-found-the-most-exposed-countries-on-the-internet/

Categorized in Online Research

A long, long time ago I was talking to Mike Grehan about search engine rankings. He used the term “the rich get richer”, to explain why sites that live at the top of Google are likely to stay there.

One of the reasons is the ease of findability.

A writer who is researching a subject on Google is highly likely to click the top result first. If that web page answers the right questions then it may be enough to earn a citation in an article, and that link will help fortify the search position.

The rich get richer.

I mention this because yesterday I read a brilliant, in-depth post from Glen Allsopp (aka @Viperchill), which illustrates that the rich do indeed get richer, at least in terms of search positions.

In this case, the rich are major publishing groups.

The way they are getting richer is by cross-linking to existing and new websites, from footers and body copy, which are “constantly changing”.

There’s nothing hideously wrong with this approach, but it’s a bit of risk to tweak the links quite so often. Especially when the anchor text is something other than the site’s brand name.

As Glen says:

“As anyone who has been involved in search engine optimisation for a period of time might wonder, surely getting so many sitewide links in a short timeframe should raise a bit of a red flag?”
It’s interesting to see that Google not only tolerates it, but actively rewards this kind of behaviour, at least in the examples highlighted in Glen’s post.

The short story is that Hearst was found to be linking to a newly launched site, BestProducts, from its portfolio of authority websites, which includes the likes of Cosmopolitan, Elle, Marie Claire and Bazaar.

This helped to put the new site on the map in a rather dramatic way.

Party hard in footerland

Here are a couple of screenshots. The first is from March, when the anchor text was ‘Style Reviews’

cosmomarch 2

The second appeared later, with the link text changing to ‘Beauty Reviews’. Note that the link placement changed too.

cosmopolitan 1

I’m going to assume that these links are dofollow, which is a potentially risky tactic, and one that has attracted the dreaded manual penalty for some site owners.

Furthermore, this is clearly something that has been done with intent. Design, not accident.

Glen says:

“It’s now obvious that the people working for Woman’s Day, Marie Claire, Popular Mechanics and Esquire had some conversion that went along the lines of, ‘Don’t forget, today’s the day we have to put those links to Best Products in the footer.’”
But did it work?

The results

Glen estimates that BestProducts attracted at least 600,000 referrals from Google (organic) in April 2016, so yep, it has worked incredibly well.

Here are some of the positions that the site has bagged in little over half a year, from a standing start:

Screen Shot 2016-06-07 at 14.51.32

Pretty amazing, right? Some pretty big, broad terms there.

Glen reckons that the following 16 companies – and the brands they own – dominate Google results.


I suspect that if you look at other industries, such as car hire, where a few brands own hundreds of sub-brands, that you’ll see similar tactics and results.

We are family?

The standout question for me isn’t whether Hearst and its peers are systematically outsmarting Google with a straightforward sitewide link strategy, nor whether that strategy will hold up. It is more about whether Google truly understands related entities.

Does it know that these sites are linked to one another by having the same parent company? And does that discount the link tactics in play here?

Certainly one of the webspam team would be able to spot that one site was related to another, were it flagged for a manual action. So is Google turning a blind eye?

Here’s what Matt Cutts said about related sites, back in 2014:

“If you have 50 different sites, I wouldn’t link to all 50 sites down in the footer of your website, because that can start to look pretty spammy to users. Instead you might just link to no more than three or four or five down in the footer, that sort of thing, or have a link to a global page, and the global page can talk about all the different versions and country versions of your website.”

“If you’ve got stuff that is all on one area, like .com, and you’ve got 50 or 100 different websites, that is something where I’d be really a lot more careful about linking them together.”

“And that’s the sort of thing where I wouldn’t be surprised if we don’t want to necessarily treat the links between those different websites exactly the same as we would treat them as editorial votes from some other website.”

Note that Matt talks about links to other sites, as opposed to “links with descriptive and ever-changing anchor text”. Somewhat different.

Screw hub pages, launch hub sites

Internal linking works best when there is a clear strategy in place. That normally means figuring out a taxonomy and common vocabulary in advance. It also means understanding the paths you want to create for visitors, to help pull them towards other pages, or in this case, other sites. These should mirror key business goals.

With all that in mind, I think it’s pretty smart, I really do, but let’s see how it plays out. And obviously it takes a rich portfolio of authority websites to play this hand, so yeah… the rich get richer.

Assuming this strategy works out in the long run we can expect to see lots more niche sites being launched by the big publishing groups, underpinned by this kind of cross-site linking.

Ok, so this fluid footer linking approach certainly sails a bit close to the wind and we may not have heard the last of this story, but it once again proves the absolute power of links in putting a site on the map. Take any statements about links not being mattering so much in 2016 with a large bucket of salt.

Source:  https://searchenginewatch.com/2016/06/07/are-related-sitewide-footer-links-the-key-to-dominating-google/

Categorized in Search Engine

It takes less than a minute to opt-out of Facebook's new ads system.

Facebook member or not, the social networking giant will soon follow you across the web -- thanks to its new advertising strategy.

From today, the billion-plus social network will serve its ads to account holders and non-users -- making one giant push in the same footsteps as advertising giants like Google, which has historically dominated the space.

In case you didn't know, Facebook stores a lot of data on you. Not just what you say or who you talk to (no wonder it's a tempting trove of data for government surveillance) but also what you like and don't like. And that's a lot of things, from goods to services, news sites and political views -- not just from things you look at and selectively "like" but also sites you visit and places you go. You can see all of these "ad preferences" by clicking this link.

Facebook now has the power to harness that information to target ads at you both on and off its site.

In fairness, it's not the end of the world -- nor is it unique to Facebook. A lot of ads firms do this. Ads keep the web free, and Facebook said that its aim is to show "relevant, high quality ads to people who visit their websites and apps."

Though the company hasn't overridden any settings, many users will have this setting on by default, meaning you'll see ads that Facebook thinks you might find more relevant based on what it knows about you.

The good news is that you can turn it off, and it takes a matter of seconds.


Head to this link (and sign in if you have to), then make sure the "Ads on apps and websites off of the Facebook Companies" option is turned "no."

And that's it. The caveat is that you may see ads relating to your age, gender, or location, Facebook says.

You can also make other ad-based adjustments to the page -- to Facebook's credit, they're fairly easy to understand. The best bet (at the time of publication) is to switch all options to "no" or "no-one."


Given that this also affects those who aren't on Facebook, there are different ways to opt-out.

iPhones and iPads can limit ad-tracking through an in-built setting -- located in its Settings options.

Android phones also have a similar same setting -- you can find out how to do it here.

As for desktops, notebooks, and some tablets, your best option might be an ad-blocker.

But if you want to be thorough, you can opt-out en masse from the Digital Advertising Alliance. The website looks archaic, and yes, you have to enable cookies first (which seems to defeat the point but it does make sense, given these options are cookie-based) but it takes just a couple of minutes to opt-out.

Source:  http://www.zdnet.com/article/to-stop-facebook-tracking-you-across-the-web-change-these-settings/

Categorized in Internet Privacy


Most major PC makers are shipping their desktops and notebooks with pre-installed software, which researchers say is riddled with security vulnerabilities.


A highly-critical report by Duo Security released Tuesday said Acer, Asus, Dell, HP and Lenovo all ship with software that contains at least one vulnerability, which could allow an attacker to run malware at the system-level -- in other words, completely compromising an out-of-the-box PC.



The group of PC makers accounted for upwards of 38 million PCs shipped in the first quarter of the year, according to estimates garnered from IDC's latest count.


The vast majority of those will be sold to consumers, and most of those will come with some level of system tool used to monitor the computer's health or processes. This so-called bloatware -- also known as junkware or crapware -- is preinstalled software that lands on new PCs and laptops, and some Android devices. Often created by the PC maker, it's usually deeply embedded in the system and difficult to remove.


PC makers install the software largely to generate money on low-margin products, despite it putting system security at risk.


"We broke all of them," said Duo researchers in a blog post. "Some worse than others."

Every PC maker that was examined had at least one flaw that could have let an attacker grab personal data or inject malware on a system through a man-in-the-middle attack.



One of the biggest gripes was the lack of TLS encryption used by the PC makers, which creates a secure tunnel for files and updates to flow over. Updating over HTTPS makes it difficult, if not impossible, to carry out man-in-the-middle attacks.


Of the flaws, Acer and Asus scored the worst with signed manifest and update files over unencrypted connections, potentially allowing an attacker to inject malware code as it's being downloaded. By not using code-signing checks, an attacker can trivially modify or replace files and manifests in transit, said the corresponding report.


The flaws are such easy targets, the researchers said the "average potted plant" could exploit the flaws.

Duo's researchers found a total of 12 separate vulnerabilities, with half of those rated "high," indicating a high probability of exploitation.


Most of higher-priority flaws were fixed, but Asus and Acer have yet to offer updates.


The researchers said users should wipe and reinstall "a clean and bloatware-free copy of Windows before the system is used, otherwise, reducing the attack surface should be the first step in any system-hardening process."


A Dell spokesperson said Wednesday that, "customer security is a top priority" for the company. "We fared comparatively well in their testing and continue to test our software to identify and fix outstanding vulnerabilities as we examine their findings more closely."


Lenovo said in a statement: "Upon learning of the vulnerability, Lenovo worked swiftly and closely with Duo Security to mitigate the issue and a publish a security advisory (which can be found here." The spokesperson also said a System Update removal utility "will soon be available."


Acer, Asus, and HP did not respond to a request for comment.


Source:  http://www.zdnet.com/article/hp-dell-acer-asus-bloatware-security-flaws-vulnerabilities/




Categorized in Internet Privacy

A prospective client had something to hide when she claimed no previous involvement in an industry rife with fraud. This claim stated in conjunction with the submission of an informed business plan rang false. Other clues about her integrity worried the lawyer. He soon suspected that she was a dishonest person. After the meeting, he consulted another partner, who in turn delivered the puzzle to my e-mail inbox. My mission was to fit the mismatched pieces of information together, either substantiating or disproving the lawyer's skepticism.

Internet Archive to the Rescue

Wanting to emphasize the importance of retaining knowledge of history, George Santayana wrote the words made famous by the film, Rise and Fall of the Third Reich--"Those who cannot remember the past are condemned to repeat it." Of course, at the time the Internet Archive didn't exist; nor did the Information Age. If it had, perhaps he would have edited his philosophy to state, "Those who cannot discover the past are condemned to repeat it."

Certainly in times when new information amounts to five exabytes, or the equivalent of "information contained in half a million new libraries the size of the Library of Congress print collections" (How Much Information 2003?), it is perhaps fortunate that librarians possess a knack for discovering information. It is also in our favor that Brewster Kahle and Alexa Internet foresaw a need for an archive of Web sites.
Internet Archive and the Wayback Machine

Founded in 1996, the Internet Archive contains about 30 billion archived Web pages. While always open to researchers, the collection did not become readily accessible until the introduction of the Wayback Machine in 2001. The Wayback Machine enables finding archived pages by their Web address. Enter a URL to retrieve a dated listing of archived versions. You can then display the archived document as well as any archived pages linked from it.

The Internet Archive helped me successfully respond to the concerns the lawyers had about the prospective client. It contained evidence of a business relationship with a company clearly in the suspect industry. Broadening the investigation to include the newly discovered company led to information about an active criminal investigation.

Suddenly, the pieces of the puzzle came together and spelled L-I-A-R.
Using the Internet Archive should be a consideration for any research project that involves due diligence, or the careful investigation of someone or something to satisfy an obligation. In addition to people and company investigations, it can assist in patent research for evidence of prior art, or copyright or trademark research for evidence of infringement. It can also come in handy when researching events in history, looking for copies of older documents like superseded statutes or regulations, or when seeking the ideals of a former political administration. (Note: 25 October 2004.

A special keyword search engine, called Recall Search, facilitates some of these queries. Unfortunately, it was removed from the site during mid-September. Messages posted in the Internet Archive forum indicate they plan to bring it back. Note: 15 June 2007. I think it's safe to assume that Recall Search is not coming back. However, check out the site for developments in searching archived audio (music), video (movies) and text (books).)

Recall Search at the Internet Archive

But while the Internet Archive contains information useful in investigative research, finding what you want within the massive collection presents a challenge. If you know the exact URL of the document, or if you want to examine the contents of a specific Web site--as was the case in the scenario involving the prospective client--then the Wayback Machine will suffice. But searching the Internet Archive by keyword was not an option until recently. (Note: See the note in the previous paragraph.)

During September 2003, the project introduced Recall Search, a beta version of a keyword search feature. Recall makes about one-third, or 11 billion, Web pages in the archived collection accessible by keyword. While it further facilitates finding information in the Internet Archive, it does not replace the Wayback Machine. Because of the limited size of the keyword indexed collection and the problems inherent in keyword searching, due diligence researchers should use both finding tools.
Recall does not support Boolean operators. Instead, enter one or more keywords (fewer is probably better) and, if desired, limit the results by date.

Results appear with a graph that illustrates the frequency of the search terms over time. It also provides clues about their context. For example, a search for my name limited to Web pages collected between January 2002 and May 2003 finds ties to the concepts, "school of law," "government resources," "research site," "research librarian," "legal professionals" and "legal research." The resulting graph further shows peaks at the beginning of 2002 and in the spring of 2003.

Applying content-based relevancy ranking, Recall also generates topics and categories. Little information exists about how this feature works, and I have experienced mixed results. But the idea is to limit results by selecting a topic or category relevant to the issue.

Suppose you enter the keyword, Microsoft. The right side of the search results page suggests concepts for narrowing the query. For example, it asks if instead you mean Microsoft Windows, Microsoft Internet Explorer, Microsoft Word, and so on. Likewise, a search for turkey suggests wild turkey, the country of Turkey, turkey hunting, roast turkey and other interpretations.

While content-based relevancy ranking can be a useful algorithm, it is far from perfect. Some topics and categories generated might not seem to make sense. If the queries you run do not produce satisfactory results, consider another approach.

Pinpoint the specific sites you want to investigate by first conducting the research on the Web. In the prospective client example, an old issue of the newsletter of the company under criminal investigation (Company A) mentioned the prospective client's company (Company B). This clue led us to Company A's Web site where we found no further mention of Company B. However, with the Web site address in hand, we reviewed almost every archived page at the Internet Archive and found solid evidence of a past relationship. Additional research, during which we tracked down court records and spoke to one of the investigators, provided the verification we needed to confront the prospective client.

Advanced Search Techniques

You can display all versions of a specific page or Web site during a certain time period by modifying the URL. Greg Notess first illustrated this strategy in his On The Net column (See "The Wayback Machine: The Web's Archive," Online, March/April 2002).
A request for all archived versions of a page looks like this:

The asterisk is a wildcard that you can modify. For example, to find all versions from the year 2002, you would enter:
Or to find all versions from September 2002, you would enter:
Sometimes you encounter problems when you browse pages in the archive. For example, I often receive a "failed connection" error message. This may be the result of busy Web servers or a problem with the page. It may also occur if the live Web site prohibits crawlers.

To find out if the latter issue is the problem, check the site's robot exclusion file. A standard honored by most search engines, the robot exclusion file resides in the root-level directory. To find it, enter the main URL in your browser address line followed by robots.txt. Like this: http://www.domain.com/robots.txt .
If the site blocks the Internet Archive's crawler, it will contain two lines of text similar to the following:
User-agent: ia_archiver
Disallow: /
If it forbids all crawlers, the commands should look like this:
User-agent: *
Disallow: /

It's common for Web sites to block crawlers, including the Internet Archive, from indexing their copyrighted images and other non-text files. If the Internet Archive blots out images with gray boxes, then the Web site probably prevents it from making the graphics available.

If the site does not appear to block the Internet Archive, don't give up when you encounter a "failed connection" message. Return to the Wayback Machine and enter the Web page address. This strategy generates a list of archived versions of the page whereas Recall presents specific matches to a query. One of the other dated copies of the page may load without problems.


While the Internet Archive does not contain a complete archive of the Web, it offers a significant collection that due diligence researchers should not overlook. Tools like the Wayback Machine and Recall Search provide points of access. However, these utilities only handle simple queries. You can search by Web page address or keyword. You cannot conduct Boolean searching or limit a query by key information. Moreover, Recall Search limits keyword access to one-third of the collection. Consequently, conduct what research you can elsewhere first using public Web search engines and commercial sources. Then use the information you discover to scour relevant sites in the Internet Archive.

Source:  http://virtualchase.justia.com/content/internet-archive-and-search-integrity

Categorized in Science & Tech

Have you ever tracked all the ways you use data in a single day? How many of your calories, activities, tasks, messages, projects, correspondences, records and more are saved and accessed through data storage every day? I bet you won’t be able to stop once you start counting.

Many of us never pause to consider what that means, but data is growing exponentially — with no end in sight. There are already more than a billion cellphones in the world, emitting 18 exabytes (1 billion gigabytes) of data every month. As more devices continue to connect to the Internet of Things, sensors on everything from automobiles to appliances increase the data output even more.

By 2020, IDC predicts that the amount of data will increase by a thousandfold, reaching a staggering 44 zettabytes of data. The only logical response to this data deluge is to create more ways to store and maximize all this information.

Artificial intelligence and machine learning have become major areas of research and development in recent years as a response to this data flood, as algorithms work to find patterns that can help manage the data. While this is a step in the right direction in terms of learning from data, it still doesn’t solve the storage problem. And while interesting advances are being made in data storage on DNA molecules, for now, realistic data storage options are still a little less sci-fi sounding. Here are four viable solutions to our storage capacity woes.

The hybrid cloud

We all understand the concept of the cloud. Hybrid cloud storage is a little different though, in that it uses both storage in the cloud as well as on-site storage or hardware. This creates more value through a “mash-up” that accesses either kind of storage, depending on the security and the need for accessibility.

A hybrid data storage solution addresses common fears about security, compliance and latency that straight cloud storage raises. Data can be housed either onsite or in the cloud, depending on risk classification, latency and bandwidth needs. Enterprises that choose hybrid cloud storage are drawn to it because of its scalability and cost-effectiveness, combined with the option of keeping sensitive data out of the public cloud.
All flash, all the time

Flash data storage is the most common form widely used in consumer tech, including cell phones. Unlike traditional storage, which stores info on discs, flash stores and accesses info directly from a semiconductor. With flash prices continuing to fall as the technology is able to store more info in the same amount of space, flash makes sense for a lot of medium-sized enterprises.

Recent breakthroughs by data storage company Pure Storage aim to scale flash to the next level, making it a real contender for large enterprises in the big data storage war. Pure took its all-flash approach to storage with FlashBlade, a box designed to store petabytes of unstructured data in an unprecedented scale. The refrigerator-sized box can store up to 16 petabytes of data, and co-founder John Hayes believes that amount can be doubled by 2017. Sixteen petabytes is already five times as much data as traditional storage devices, so clearly Pure’s scalable blade approach is a step in the right direction.


Intelligent Software Designed Storage (I-SDS) removes the need for cumbersome proprietary hardware stacks that are generally associated with data storage, and replaces them with storage infrastructure that is managed and automated by intelligent software, rather than hardware. I-SDS is also more cost efficient, with faster response times, than storing data on hardware.

I-SDS moves toward a storage design that mimics how the human brain stores vast amounts of data with the unique ability to call it up at a moment’s notice. Essentially, I-SDS allows big data streams to be clustered. Approximate search and the stream extraction of data combine to allow the processing of huge amounts of data, while simultaneously extracting the most frequent and appropriate outputs from the search. These techniques give I-SDS a huge advantage over obsolete storage models because they team up to improve speed while still achieving high levels of accuracy.

Cold storage archiving

Cold storage is economical, if not often used. By keeping on slower moving and less expensive disks data that doesn’t need to be readily available, space is freed up on faster disks for information that does need to be readily available. This option makes sense for large enterprises with backlogged info that doesn’t need to be readily accessed regularly.

Such enterprises can store their data based on its “temperature,” keeping hotter data on flash, where it can be more quickly accessed, and archived info in cost-effective cold storage archives. However, the deluge of big data means that enterprises are gleaning so much data at once that knowing what is valuable and what can be put on the back burner isn’t always clear.

Bigger data, smarter storage

While the sheer volume of data continues to grow exponentially, so too does its perceived value to companies eager to glean information about their consumers and their products. Data storage needs to be fast, intuitive, effective, safe and cost-effective — a tall order in a world where data now far outpaces the population. It will be interesting to see which method can best address all these needs simultaneously.

Source: http://techcrunch.com/2016/05/22/how-storage-is-changing-in-the-age-of-big-data/

Categorized in Science & Tech

Is everyone’s website illegal?

Your website consists of visible text and graphics, geared to the sighted reader. Its terms and conditions include legal disclaimers and limitations of liability, which, it explains, apply unless they are specifically prohibited by law. As a service to the public, you have posted scores of videos providing useful information for consumers in your industry.
Are these common practices illegal?

Some class action lawyers say so. They’ve been making claims against standard websites that they claim violate the federal Americans with Disabilities Act or a New Jersey consumer protection statute.
Class actions targeting website practices aren’t unusual. In the early days of the commercial Internet, many companies were sloppy with their website terms and privacy policies. Most notably, high flying dot-com companies that promised never to sell their customer data were caught flat-footed when the bubble burst. In liquidation, their customer lists were their most saleable assets, which they then usually sold, in violation of their prior promises.
Cases from that era showed the legal vulnerability of disconnects between website promises and actual business practices.

Similarly, when web technologies ran ahead of website disclosures, as allegedly occurred in some cases with behavioral advertising, customer tracking, and information sharing practices, the class action lawyers pounced then too. On multiple occasions in 2010 and 2011, the Wall Street Journal’s “What They Know” series would run articles about customer tracking on the Internet, and, the very next day, class action suits were filed keyed to the practices revealed by the Journal.

The ADA and New Jersey suits appear to be the newest wave of Internet class actions —ones that have the potential to reach thousands if not millions of website operators.

Is the Internet expanding privacy expectations?

Internet privacy - e-mails
Is the Internet invading privacy, or expanding privacy? The conventional wisdom is that the Internet is eviscerating privacy. But in some ways a heightened focus on privacy in the digital era may be creating new and greater privacy expectations.

Consider the simple matter of lists of addressees and cc’s on emails.
In the ancient days of postal mail, it was never a big deal if a sender revealed, on a letter, the other persons to whom he or she were sending the same letter, or to whom he or she was sending copies of that letter. Lots of letters show multiple addressees or multiple persons copied. That indeed explains the origin of the “cc” field, as a visible list of the persons to whom a “carbon copy” (another ancient term) was being sent.

But an expectation has developed recently that one should never send out a mass email that reveals the email addresses of all of the recipients, unless they previously knew one another. That is why the Federal Trade Commission was so red-faced recently when, in the course of preparation for its first PrivacyCon workshop on privacy research, it sent an email out to all attendees revealing – horrors! – all of their email addresses. The agency “sincerely apologized” for this terrible mistake

The presumed confidentiality of one’s email address is seen in other laws. College professors, for example, are instructed never to communicate with an entire class of students by placing all of their email addresses in the “to” field; rather, they must use the “bcc” field, so that no student receives his or her fellow students’ email addresses, which some of them may have designated as confidential personal information under the Family Educational Rights and Privacy Law.

Though the prohibition against letting strangers see others’ email addresses in group emails now seems to be settled, the presumed harm to be avoided — use of those email addresses for bulk commercial emails — is fairly speculative, and such a misuse, if it occurred, would seem to cause more of an inconvenienced harm than a true privacy invasion. The same goes for concerns about reply-all “catastrophes,” such as the one that hit Thompson Reuters employees in August 2015. The event inconvenienced employees, but its true lasting impact appeared to be a flood of humorous Twitter traffic. (Another reply-all incident struck Time Inc. just this week.)

The best explanation for this new expectation, rather, seems to be an expanding understanding of privacy, at least in certain areas. Contrary to the conventional wisdom, our expectations of privacy are not steadily and uniformly shrinking. In some cases, they are expanding.

Can you be sued for posting your opinions on the Internet?

A restaurant tells customers it may sue them if they post unfavorable reviews on the Internet. A flooring company sues a customer who complained on social media that he had an “absolutely horrible experience” with the company.
Klear-Gear, a gadget company, included in its Internet terms a provision that “your acceptance of this sales contract prohibits you from taking any action that negatively impacts KlearGear.com, its reputation, products, services, management or expenses.” The terms also set damages for a violation: $3,500.

If something seems wrong to you about these cases, you are not alone. While libel law has struggled for years with the dividing line between expressions of actionable fact and constitutionally protected opinion, most laypeople, and judges, believe that statements of opinion should be protected, and broadly construed.
That may be why Grill 225, a restaurant in Charleston, met with such opposition when its scheme for suppressing unfavorable reviews was recently publicized. The restaurant required persons booking online reservations to agree to terms and conditions in which, among other things, the customer agreed “that they may be held legally liable for generating any potential negative, verbal, or written defamation against Grill 225.

Most efforts to prevent or penalize Internet comments and criticism are crushed in the court of public opinion even before they reach the courthouse. Grill 225, for example, is really only stating the obvious when it says that it could sue a customer. It wisely hasn’t done so in the two years that it has posted its terms. The flooring company, in Colorado, did sue its customer, but the case provoked a state legislator to propose stronger protection against suits aimed at chilling free speech.

Indeed, last year California passed a so-called “Yelp Bill” that prohibited businesses from including in their contracts “a provision waiving the consumer’s right to make any statement regarding the seller or lessor or its employees or agents, or concerning the goods or services.” A similar bill, the proposed Consumer Review Freedom Act, has been introduced in Congress.

When cases do get to court, even under existing law, statements of opinion are generally protected. As one example, consider a case involving presidential candidate Donald Trump, back in the early 1980s, when he announced an audacious plan to build the tallest building in the world, a 150-story skyscraper, on landfill just south of downtown Manhattan.

Trump’s plan met opposition in Chicago, then home of the world’s tallest building, the 108-story Sears Tower (now Willis Tower). Specifically, the Chicago Tribune’s architecture critic, Paul Gapp, analyzed Trump’s proposal in a review and deemed it, among other things, “one of the silliest things anyone could inflict on New York or any other city." Gapp’s review was accompanied by a Tribune artist’s rendering of southern Manhattan with a giant new building, a Sears Tower lookalike on steroids, sticking out like a sore thumb below and east of Battery Park.

Trump, no more shy then than he is now, immediately sued the Tribune, seeking damages to the tune of $500 million. I worked for the Tribune’s law firm and had the task of writing the motion to dismiss Trump’s case. There were plenty of good legal authorities on the right of critics to express their opinions, but I decided to prepare our brief a bit differently.

Experts make privacy regulation a serious threat

Now is the time to get smart about privacy and technology, because your government regulators are smart and savvy in those areas.

No, that’s not a misprint. Though government regulators are often far behind on the technology curve, real experts have taken over at several important agencies that regulate conduct on the Internet.
Take Ashkan Soltani, who took over in late 2014 as Chief Technologist for the Federal Trade Commission. Just by hiring a chief technologist, the FTC showed awareness of the need for deep computer expertise to effectively regulate privacy and commercial practices on the Internet. And by hiring Soltani, one of the sharpest computer privacy experts in country, the FTC showed it was serious.

Soltani was one of a handful of computer experts who have been at the forefront of studying privacy on the Internet. Along with his former colleagues at Berkeley, and like-minded researchers, especially at Stanford and Carnegie-Mellon universities, Soltani has identified and publicized many previously unknown ways in which the Internet allows personal information to be collected, used and commercialized.
Soltani and his colleagues haven’t just quietly studied Internet privacy. They’ve been active and savvy in getting the word out on their studies.

To take one example, a few years back, most website operators thought they had satisfied their disclosure obligations if they told their users that they honored users’ instructions with respect to HTTP “cookies” (datasets that identify previous browsing activity). But in an important research report in 2009, Soltani and colleagues reported that even when users deleted HTTP cookies, in an attempt to shield knowledge of their previous browsing activity, some websites, by activating Flash cookies (often tied to web video files) would automatically regenerate those HTTP cookies – a generally unintended result, but one that cast doubt on the company’s privacy promises. Soltani followed this up with reports on other pervasive tracking technologies.

Soltani and his colleagues and co-authors, many of whom, like him, are motivated by their need for more privacy protection, focused their research on exposing technologies (like Flash cookies) that collected or revealed information that consumers thought was private. Many of their research projects became the foundation of class action lawsuits against companies that made privacy promises in ignorance of these technologies.

And it wasn’t a coincidence that Soltani’s research was used in class action cases. He served as technology adviser to the Wall Street Journal for its widely read “What They Know” series that has brought many Internet privacy issues to widespread attention, beginning in 2010. In several instances, a flurry of class action suits followed within days of the Journal’s Soltani-supported articles.

Soltani isn’t the only technology whiz to join the government from the Berkeley-Stanford-Carnegie-Mellon research triad. The Federal Communications Commission recently announced that it was hiring Jonathan Mayer, another member of the group, to act as its Chief Technologist. Like Soltani, his research has focused specifically on web-tracking technologies. And as with Soltani, Mayer’s research has led to major privacy cases, including an FTC consent decree against Google, concerning its use of tracking code on the Safari browser. While privacy isn’t an FCC focus, Mayer’s work on net neutrality could significantly affect many businesses.

Some business people may think that they don’t have to worry much about the FTC, a slimly staffed agency that has the impossible mission of policing “unfair or deceptive acts or practices” all over our huge country. But the FTC has been very active in the Internet privacy area, and its results, usually in the form of consent decrees, are reshaping how business is done on the Internet.

As two privacy experts have pointed out in a law review article titled “The FTC and the New Common Law of Privacy,” the FTC has become the primary regulator of privacy on the Internet, and its large and growing body of consent decrees have an effect far beyond the companies that are directly bound (which, moreover, includes such Internet giants as Google, Microsoft, Facebook, and Linkedin). The authors assert that contrary to the general belief that the United States has weak privacy regulation compared to Europe, that view is “becoming outdated as FTC privacy jurisdiction develops and thickens.”

Source:  http://www.thompsoncoburn.com/news-and-information/internet-law-twists-and-turns.aspx

Categorized in Internet Privacy
Page 2 of 4

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media