Google has just updated how it determines your business’s local ranking.Another week, another step towards Google becoming a damn sight more transparent than we’re used to. If it wasn’t enough that Google revealed its top three ranking signals for organic search last Friday, this Friday it revealed a new ranking signal for local SEO… Prominence.

As noticed by Mike Blumenthal, and reported by SEJ over the weekend, Google has updated its Google My Business help page.

This resource details how you can improve your local rankings with practical guidance on keeping your business information complete and accurate (physical address, phone number, category), verifying your location(s), keeping your opening hours accurate and managing customer reviews.It also lists the ways in which Google determines your local ranking…

How Google ranks your business for local search

Relevance

How well does your local listing match what someone is searching for? This is why you business information should always be fully detailed and accurate.

Distance

How close are you to the person searching for a particular term? Bear in mind that relevance will be the stronger signal. If a business is further away from a searcher’s location, but is more likely to have what they’re looking for than a business that’s closer, Google will rank it higher in local results.

Additionally, if a user doesn’t specify a location, Google will calculate distance based on what’s known about their location.And this weekend Google added another…

Prominence

Basically… How well known is your business?

Here’s the exact wording on ‘prominence’ from Google…

Some places are more prominent in the offline world, and search results try to reflect this in local ranking. For example, famous museums, landmark hotels, or well-known store brands that are familiar to many people are also likely to be prominent in local search results.

Prominence is also based on information that Google has about a business from across the web (like links, articles, and directories). Google review count and score are factored into local search ranking: more reviews and positive ratings will probably improve a business’s local ranking.

Your position in web results is also a factor, so SEO best practices also apply to local search optimization.Your business’s overall organic search presence is a ranking factor when it comes to local.So ultimately, all of your regular, everyday SEO practices that you do to boost your rankings, whether on-page or off, apply to local.

Customer reviews

It’s also interesting to note that has Google confirmed that customer reviews and ratings are factored into local search ranking. (Although be warned that there was a ‘probably’ in the original text above.)

Experts always figured this was true anyway. Moz previosly found that review signals are 8.4% of the overall ‘local ranking pie’.

moz-local-ranking1

But again, it’s just nice to get confirmation on these things.

Source:   https://searchenginewatch.com/2016/04/04/how-does-google-determine-my-local-ranking/

Categorized in Search Engine

Google is your portal to everything out there on the World Wide Web...but also your portal to more and more of your personal stuff, from the location of your phone to the location of your Amazon delivery. If you’re signed into the Google search page, and you use other Google services, here are nine search tricks worth knowing.

It probably goes without saying but just in case: only you can see these results. Nobody else can Google your next hotel trip. How well they work is going to depend on how plugged in you are to other tools like Gmail, but they’re useful shortcuts from the Google homepage or the Chrome address bar.

“I’ve lost my phone”

The newest one in our list, which is essentially an easier way to get to Android Device Manager. Google “I’ve lost my phone” to see the last known location of all the phones linked to your Google account. You can call and lock your phone as well as locate it, and it works with both Android and iOS devices.

“Contact <name>”

Get at your Google Contacts straight from the Google search page with this trick, simply adding the name of one of your friends or family members after the “contact” keyword. If there’s more than one match found, you’ll see a list of options—click on any of the results to initiate an audio call over Hangouts.

“My deliveries”

Next, a series of personal searches that tap into the information Google has from your Gmail account. Use “my deliveries” (or “packages” or “purchases”) to see recent orders stashed in your inbox—click on any of the entries shown on screen and you can see prices together with any available tracking details.

“My flights”

For a while now Gmail has done a very good job of spotting travel plans hidden among your email messages (it’s basically what Inbox is built on) and if you Google “my flights” you can see past and future trips through the air. Expand any entry in the list to see flight numbers, times, and other salient details.

“My hotels”

The “my hotels” search works just like the flights one, with Google tapping into your inbox to bring up all the hotel reservations you’ve made. Again, click on any entry in the list to see the details—you can jump straight to the relevant email in Gmail, get directions to the hotel, and see older reservations too.

“My shows”

Run a search for “my shows” and you see all of your upcoming plays, gigs and other events that you might have a confirmation for somewhere in your Gmail account. Google does a decent job of pulling out the right details for you. Use “my reservations” to see both hotels and shows in the same list together.

“My bills”

You probably don’t want to be reminded about upcoming bills you’ve got to pay or about any money going out of your bank account, but just in case... “my bills” will find it for you, provided that there’s some kind of record in your Gmail account. If you want any financial assistance, that’s a separate Google search.

“My events”

A quick way of setting everything that’s coming up in your Google Calendar. You can also run queries like “when’s my next appointment?” or “what am I doing next week?” to get personal answers from Calendar. Click on an entry to see dates, times, descriptions and a list of the guests signed up to attend.

“My photos”

Hello, Google Photos! Google can bring up a response to “my photos” and “my videos” provided you’re using its in-house photo storage service. You can even get creative: try “my photos of me” or “my photos of cats” and see what kind of a results you get. It’s all very slick and straightforward.

Source:  http://fieldguide.gizmodo.com/9-secret-google-search-tricks-1781341511

Categorized in Search Engine

Performing an SEO audit? Contributor Max Prin demonstrates how to find all of a website's indexed subdomains using a simple (and free) Chrome plugin.

An SEO audit is rarely limited to the www (or non-www) version of a website. When looking for potential duplicate content, it’s often important to know how many subdomains exist and, more importantly, how many of them are indexed by Google.

The good old search operators

An easy way to find indexed subdomains is to use search operators.

Start with “site:” and the root domain.

sudmains-01

 

2. One by one, remove each subdomain (including: “www”) from the results with the “-inurl:” operator.

sudmains-02

3. When there are no more results for Google to return, your query with the search operators should include all subdomains indexed.

However, this technique has its limits. It’s unlikely that the site you’re auditing has as many subdomains as wordpress.com, but you may come across a site with several dozen subdomains. This can potentially cause the following issues:

The process can be long, especially if it needs to be done for several domains.
You might get Google “captchas” along the way.

The size of queries is limited (around 30 keywords). Thus, if your query is too long (too many -inurl operators), you will get a 400 error page from Google.Once you’re done, you still need some editing to create a nice list of subdomains to work with.

The solution: a simple Chrome extension, by Google

This extension, Personal Blocklist (by Google), will make your life easier. It allows you to “block” domains from appearing in your search results.The key here is that the extension operates at the subdomain level and stores the domains in a list.

1. Once added to Chrome, start with the same “site:domain.com” search command.

2. Under each result now appears a “Block subdomain.domain.com” link.

3. Click on each link until your result page is empty.

sudmains-05

4. You’re almost done! Simply click on the extension icon, then “export,” then copy/paste into Excel.

Enjoy!

Source:  http://searchengineland.com/quickly-find-export-subdomains-indexed-google-251370

Categorized in Search Engine

I moved to America from Thailand when I was 9 years old – old enough to have learned how to read, speak, and write basic Thai words but not advanced enough to write a professional cover letter or blog post.

It’s somewhat of a difficult language – Thai has 44 consonants, 22 vowels, diphthongs, and triphthongs, and five tones. The Sanskrit-based language can appear intimidating to those unfamiliar with the alphabet, and even some local Thais have difficulty spelling some words due to all the characters involved.

So even though I’m generally able to read Thai, it’s been extremely difficult to type it. After all, most keyboards don’t support all the characters involved, and when you don’t use it all the time, it’s hard to remember where the locations of each alphabet are.

Thai-Keyboard-23

And now that we live in a messaging-based world, it’s become that much more difficult for me to communicate with my mother, who does not read English well. My go-to trick has been to type out individual words I want in Google Translate then copy and paste them until they form a complete sentence, but at that rate I might as well just call her on the phone. Which, of course, can be inconvenient in its own right as we both have full time jobs and busy schedules.

*Long sentences on Google Translate to Thai generally make for very awkward translations

Here’s where “Karaoke Thai” comes along.

karaoke thai example

In Thai culture, most music videos have English phonetic spellings of each sentence underneath the Sanskrit, which is incredibly helpful for new language learners to follow along. I haven’t been able to use Karaoke Thai to communicate with my mom because she still has difficulty parsing the sounds of some English words, but the latest update to Google Input Tools may just change all that.

Now, when you type in Karaoke Thai, Google may suggest words you meant to say and instantly convert it into Sanskrit. To get it, you’ll need to download the free extension for your Chrome browser. You can even type full sentences in Karaoke Thai and it will automatically translate them.

For many people reading this post, this tool might not mean a whole lot. But for many Thais, it means easily communicating with their peers, family, new expat friends. It may be my favorite thing Google’s ever released.

I write about new technology every day, and while new Google AIs and gadgets may change the way we’ll live in the future, sometimes it’s the smallest updates that will evolve the way my mom, my grandmother, and I can finally communicate with each other present day when we’re afar.

Source:  http://thenextweb.com/google/2016/06/15/googles-new-input-tool-will-finally-help-digitally-communicate-mother/#gref

Categorized in Search Engine

Expanded Text Ads are coming to Google AdWords. Are you excited? But more importantly, are you ready?

 

Expanded Text Ads were one of several huge AdWords changes Google announced recently – if not the biggest. I still can’t believe that Google will soon actually increase its ad text limits by 2x!

 

So what exactly is changing? Here are 10 things advertisers need to know about Expanded Text Ads.

 

1. What are Expanded Text Ads?

 

Expanded Text Ads are 2x bigger than current text ads. The new ads are designed to maximize your presence and performance on mobile search results with a bigger headline and an extra long description. (And with a mobile-first mindset, whatever works on mobile is going to get applied to desktop too.)

 

Expanded Text ads will show across all devices – desktop and mobile – and will wrap automatically based on device size.

Google began testing Expanded Text ads in Q2 of 2016.

 

2. Why is Google making this change?

 

Google is calling this the biggest change to text ads since AdWords launched 15 years ago.

 

Several months ago, Google began thinking about what an AdWords ad would look like if they created AdWords in today’s mobile-first world, where more than half of the trillions of searches conducted on Google per year are done via a mobile device.

 

Google’s first move toward creating a unified experience across devices came in February when it killed off right side ads on desktop. Now with the constraints of desktop right-side ads gone, this change seems like a natural progression from the super-sized headlines introduced in 2011.

 

3. How much bigger are these expanded ads?

 

Expanded text ads are 2x bigger (math nerd alert: technically 47 percent bigger) from today’s AdWords text ads.

 

You now have a total of 140 characters of ad copy space to use, marking the end of the current 25-35-35 limits. No comment from Twitter as yet about their thoughts on Google adopting a 140-character limit. Here’s a little more info on the changes from the AdWords blog:

 

expanded text ads details

 

 

 

So make all those extra characters count. Create eye-catching and emotional ads that searchers can’t resist clicking on.

 

4. What do the new Expanded Text Ads look like?

 

Here’s a before and after of what the ads will look like on mobile and desktop:

 

Expanded Text Ad example

 

 

And here’s what Expanded Text Ads will look like in the AdWords interface:

 

Expanded Text Ad example

 

 

 

5. How much are headlines expanding?

 

Advertisers will have two 30-character headlines when Expanded Text Ads become available later this year.

Advertisers currently are limited to a 25-character headline.

That’s means our headlines will soon increase by 140 percent!

 

6. How much will descriptions expand?

 

Advertisers will have one 80-character description line.

Advertisers currently are limited to two 35-character description lines.

That means descriptions will increase by 14 percent.

 

7. What’s changing with display URLs?

 

AdWords will automatically extract the domain from the final URL.

Advertisers can then add up to two paths to enhance the display URL (using up to 15 characters).

 

8. Will this improve CTR?

 

Yes! More text means greater visibility. Early reports indicate that Expanded Text Ads are seeing CTR increase by as much as 20 percent.

 

At WordStream, we’ve observed CTR increase by around 12 percent by adding ad or call extensions to mobile text ads – so we expect increasing the character counts of headlines and description should result in more clicks.

 

Regardless, you can bet we’ll be closely tracking the performance of this new ad format as it becomes more widely available.

 

9. When will the new ads roll out?

 

Google hasn’t officially revealed when all advertisers will have access to Expanded Text Ads. Keep an eye out and we’ll update when we know more details.

 

10. What should you do to prepare for Expanded Text Ads?

 

Raise your Quality Scores now! Quality Score is already the most important metric in your AdWords account, but it’s about to become even more important.

 

Businesses that occupy the top spots will take up the most valuable SERP real estate – especially for commercial queries. It could make anything below position 2 or 3 on mobile devices irrelevant!

 

Additionally, you’ve got a lot of ad text optimization ahead of you. You’ll need to make sure your text ads are all rewritten to take advantage of this new format. Google is giving you 2x more space with Expanded Text Ads – so be ready to use it to your advantage!

 

Source:  https://searchenginewatch.com/2016/06/06/google-expanded-text-ads-10-things-you-need-to-know/

Categorized in Search Engine

The European Union's digital chief wants search engines such as Alphabet Inc's Google and Microsoft's Bing to be more transparent about advertising in web search results but ruled out a separate law for web platforms.

European Commission Vice-President Andrus Ansip, who is overseeing a wide-ranging inquiry into how web platforms conduct their business, said on Friday the EU executive would not take a horizontal approach to regulating online services.

 

"We will take a problem-driven approach," Ansip said. "It's practically impossible to regulate all the platforms with one really good single solution."

That will come as a relief to the web industry, dominated mainly by big U.S. tech firms such as Facebook, Google and Amazon, who lobbied hard against new rules for online platforms and what they saw as an anti-American protectionist backlash.

 

 

"We praise the Commission for understanding that a horizontal measure for all platforms is practically impossible," said Jakob Kucharczyk, director of the Computer & Communications Industry Association which represents the likes of Facebook, Google and Amazon.

"While a lot of online platforms enable economic growth, their business models differ widely."

 

However Ansip said he was worried about how transparent some search engines are when displaying ads in search results.

The Commission is also looking into the transparency of paid-for reviews as well as the conditions of use of services such as Google Maps, Apple Inc's IoS mobile operating system and Google's Android.

"Maybe it's not too much to ask for more transparency talking about search engines," Ansip said.

 

The former Estonian prime minister also poured cold water on the idea that the Commission would make search engines pay to display snippets of news articles, dubbed the "Google tax", as part of its EU copyright law reform due later this year.

The EU executive is looking into making rules on taking down illegal content clearer and more effective without making hosting websites such as YouTube directly liable.

"Now musicians ask, please, take it down and keep it down," Ansip said. "We want to make those rules more clear."

 

But the Commission will not change a provision where websites such as Amazon, eBay and Google's YouTube are not held liable for illegal content that is uploaded on to their systems. They do, however, have a responsibility to take it down once they are notified of it.

 

The Commission will publish a communication detailing its plans on web platforms in June.

Source: http://www.reuters.com/article/us-eu-tech-idUSKCN0XC2AH

 

Categorized in Online Research

Google has released an update to its personal loan advertising policy that will impact many advertisers in across financial verticals. According to Google, ads and websites that promote dangerous payday loan offers will be restricted from advertising with AdWords.

we’re banning ads for payday loans and some related products from our ads systems. We will no longer allow ads for loans where repayment is due within 60 days of the date of issue. In the U.S., we are also banning ads for loans with an APR of 36% or higher. 

 

 

 

 

Advertiser Restriction #1 – Short Payback Periods


Google will also begin restricting websites that offer payback periods of less than 60 days. Short repayment periods, combined with high interest rates, can cause borrowers to sink into unmanageable debt. The update to Google’s policies will benefit websites with better repayment options while eliminating offers from websites that can harm the end user.

 

 

This change is designed to protect our users from deceptive or harmful financial products and will not affect companies offering loans such as Mortgages, Car Loans, Student Loans, Commercial loans, Revolving Lines of Credit (e.g. Credit Cards).
By restricting the types of personal loans websites can promote, Google follows Facebook and other online advertising platforms. Facebook currently prohibits any paid advertising that promotes payday lending, regardless of the repayment periods.

 

 

Advertiser Restriction #2 – High APRs


Google will begin restricting advertisers that promote personal loans with APRs over 36%. This means a website cannot offer any personals loans over the 36% cap through site content or SEM ads.

Payday and short-term personal loans are notorious for offering high annual percentage rates (APRs). Interest rates can reach 400% or more, depending on the amount borrowed. Consumers who borrow using these types loans can face overwhelming debt as the high interest rates saddle them with even greater financial burdens.

 

 

When reviewing our policies, research has shown that these loans can result in unaffordable payment and high default rates for users so we will be updating our policies globally to reflect that.
This move is designed to protect consumers from lending that could harm them financially. In doing so, Google based its 36% APR cap on both federal and state legislation created to protect borrowers.

Google’s goal is clear – weeding out and preventing the types of loans typically associated with predatory lending. The change will not take effect immediately, however. Personal loan issuers must make their site content and offers compliant with the new regulations within the next 60 days.

 

Source:  https://www.searchenginejournal.com/google-bans-payday-loan-advertising/163500/

 

 

 

When reviewing our policies, research has shown that these loans can result in unaffordable payment and high default rates for users so we will be updating our policies globally to reflect that.
This move is designed to protect consumers from lending that could harm them financially. In doing so, Google based its 36% APR cap on both federal and state legislation created to protect borrowers.
 
Google’s goal is clear – weeding out and preventing the types of loans typically associated with predatory lending. The change will not take effect immediately, however. Personal loan issuers must make their site content and offers compliant with the new regulations within the next 60 days.
Categorized in Online Research

Here are the top 15 Most Popular Search Engines as derived from our eBizMBA Rank which is a continually updated average of each website's Alexa Global Traffic Rank, and U.S. Traffic Rank from both Compete and Quantcast."*#*" Denotes an estimate for sites with limited data.

Google1 | Google
1 - eBizMBA Rank | 1,600,000,000 - Estimated Unique Monthly Visitors | 1 - Compete Rank | 1 - Quantcast Rank | 1 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Bing2 | Bing
15 - eBizMBA Rank | 400,000,000 - Estimated Unique Monthly Visitors | 5 - Compete Rank | 19 - Quantcast Rank | 22 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Yahoo!3 | Yahoo! Search
18 - eBizMBA Rank | 300,000,000 - Estimated Unique Monthly Visitors | *8* - Compete Rank | *28*- Quantcast Rank | NA - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

search engine list

  • Over 1500+ Searchable Directories, Search Engines, Archives and Portals
  • All links are tested for being LIVE and working.
  • Information resources tested and rated for information Relevance, Subject and Resource Reliability and Data Quality through AIRS Resource Scoring Process.

Ask4 | Ask
25 - eBizMBA Rank | 245,000,000 - Estimated Unique Monthly Visitors | 14 - Compete Rank | 31 - Quantcast Rank | 31 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Aol Search5 | Aol Search
245 - eBizMBA Rank | 125,000,000 - Estimated Unique Monthly Visitors | *250* - Compete Rank |*240* - Quantcast Rank | NA - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Wow6 | Wow
271 - eBizMBA Rank | 100,000,000 - Estimated Unique Monthly Visitors | 20 - Compete Rank | *26*- Quantcast Rank | 767 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Web Crawler7 | WebCrawler
511 - eBizMBA Rank | 65,000,000 - Estimated Unique Monthly Visitors | 100 - Compete Rank | 759 - Quantcast Rank | 674 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

My Web Search8 | MyWebSearch
545 - eBizMBA Rank | 60,000,000 - Estimated Unique Monthly Visitors | *105* - Compete Rank |1,124 - Quantcast Rank | 405 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Infospace9 | Infospace
892 - eBizMBA Rank | 24,000,000 - Estimated Unique Monthly Visitors | *66* - Compete Rank |*500* - Quantcast Rank | 2,110 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Related links

Info10 | Info
1,064 - eBizMBA Rank | 13,500,000 - Estimated Unique Monthly Visitors | 378 - Compete Rank |877 - Quantcast Rank | 1,938 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Duck Duck Go11 | DuckDuckGo
1,605 - eBizMBA Rank | 13,000,000 - Estimated Unique Monthly Visitors | 1,898 - Compete Rank |2,290 - Quantcast Rank | 629 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Contentko12 | Contenko
2,402 - eBizMBA Rank | 11,000,000 - Estimated Unique Monthly Visitors | *200* - Compete Rank |*2,500* - Quantcast Rank | 4,505 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Dogpile13 | Dogpile
2,421 - eBizMBA Rank | 10,500,000 - Estimated Unique Monthly Visitors | 2,734 - Compete Rank |1,446 - Quantcast Rank | 3,084 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Alhea14 | Alhea
4,300 - eBizMBA Rank | 7,500,000 - Estimated Unique Monthly Visitors | 451 - Compete Rank |*1,225* - Quantcast Rank | 11,225 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

ixquick15 | ixQuick
8,954 - eBizMBA Rank | 4,000,000 - Estimated Unique Monthly Visitors | 12,512 - Compete Rank |4,468 - Quantcast Rank | 9,857 - Alexa Rank | Last Updated: September 1, 2016.
The Most Popular Search Engines | eBizMBA

Source : ebizmba.com

Categorized in Search Engine

Introduction to How Internet Search Engines Work

The good news about the Internet and its most visible component, the World Wide Web, is that there are hundreds of millions of pages available, waiting to present information on an amazing variety of topics. The bad news about the Internet is that there are hundreds of millions of pages available, most of them titled according to the whim of their author, almost all of them sitting on servers with cryptic names. When you need to know about a particular subject, how do you know which pages to read? If you're like most people, you visit an Internet search engine.

Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks:
They search the Internet -- or select pieces of the Internet -- based on important words.
They keep an index of the words they find, and where they find them.
They allow users to look for words or combinations of words found in that index.
Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day. In this article, we'll tell you how these major tasks are performed, and how Internet search engines put the pieces together in order to let you find the information you need on the Web.


Web Crawling

When most people talk about Internet search engines, they really mean World Wide Web search engines. Before the Web became the most visible part of the Internet, there were already search engines in place to help people find information on the Net. Programs with names like "gopher" and "Archie" kept indexes of files stored on servers connected to the Internet, and dramatically reduced the amount of time required to find programs and documents. In the late 1980s, getting serious value from the Internet meant knowing how to use gopher, Archie, Veronica and the rest.

Today, most Internet users limit their searches to the Web, so we'll limit this article to search engines that focus on the contents of Web pages.

Before a search engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling. (There are some disadvantages to calling part of the Internet the World Wide Web -- a large set of arachnid-centric names for tools is one of them.) In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages.

How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.

Google began as an academic search engine. In the paper that describes how the system was built, Sergey Brin and Lawrence Page give an example of how quickly their spiders can work. They built their initial system to use multiple spiders, usually three at one time. Each spider could keep about 300 connections to Web pages open at a time. At its peak performance, using four spiders, their system could crawl over 100 pages per second, generating around 600 kilobytes of data each second.

Keeping everything running quickly meant building a system to feed necessary information to the spiders. The early Google system had a server dedicated to providing URLs to the spiders. Rather than depending on an Internet service provider for the domain name server (DNS) that translates a server's name into an address, Google had its own DNS, in order to keep delays to a minimum.

When the Google spider looked at an HTML page, it took note of two things:
The words within the page
Where the words were found

Words occurring in the title, subtitles, meta tags and other positions of relative importance were noted for special consideration during a subsequent user search. The Google spider was built to index every significant word on a page, leaving out the articles "a," "an" and "the." Other spiders take different approaches.

These different approaches usually attempt to make the spider operate faster, allow users to search more efficiently, or both. For example, some spiders will keep track of the words in the title, sub-headings and links, along with the 100 most frequently used words on the page and each word in the first 20 lines of text. Lycos is said to use this approach to spidering the Web.

Other systems, such as AltaVista, go in the other direction, indexing every single word on a page, including "a," "an," "the" and other "insignificant" words. The push to completeness in this approach is matched by other systems in the attention given to the unseen portion of the Web page, the meta tags. Learn more about meta tags on the next page.

Meta Tags

Meta tags allow the owner of a page to specify key words and concepts under which the page will be indexed. This can be helpful, especially in cases in which the words on the page might have double or triple meanings -- the meta tags can guide the search engine in choosing which of the several possible meanings for these words is correct. There is, however, a danger in over-reliance on meta tags, because a careless or unscrupulous page owner might add meta tags that fit very popular topics but have nothing to do with the actual contents of the page. To protect against this, spiders will correlate meta tags with page content, rejecting the meta tags that don't match the words on the page.

All of this assumes that the owner of a page actually wants it to be included in the results of a search engine's activities. Many times, the page's owner doesn't want it showing up on a major search engine, or doesn't want the activity of a spider accessing the page. Consider, for example, a game that builds new, active pages each time sections of the page are displayed or new links are followed. If a Web spider accesses one of these pages, and begins following all of the links for new pages, the game could mistake the activity for a high-speed human player and spin out of control. To avoid situations like this, the robot exclusion protocol was developed. This protocol, implemented in the meta-tag section at the beginning of a Web page, tells a spider to leave the page alone -- to neither index the words on the page nor try to follow its links.

Building the Index

Once the spiders have completed the task of finding information on Web pages (and we should note that this is a task that is never actually completed -- the constantly changing nature of the Web means that the spiders are always crawling), the search engine must store the information in a way that makes it useful. There are two key components involved in making the gathered data accessible to users:

The information stored with the data

The method by which the information is indexed

In the simplest case, a search engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page, whether the word was used once or many times or whether the page contained links to other pages containing the word. In other words, there would be no way of building the ranking list that tries to present the most useful pages at the top of the list of search results.

To make for more useful results, most search engines store more than just the word and URL. An engine might store the number of times that the word appears on a page. The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page. Each commercial search engine has a different formula for assigning weight to the words in its index. This is one of the reasons that a search for the same word on different search engines will produce different lists, with the pages presented in different orders.

Regardless of the precise combination of additional pieces of information stored by a search engine, the data will be encoded to save storage space. For example, the original Google paper describes using 2 bytes, of 8 bits each, to store information on weighting -- whether the word was capitalized, its font size, position, and other information to help in ranking the hit. Each factor might take up 2 or 3 bits within the 2-byte grouping (8 bits = 1 byte). As a result, a great deal of information can be stored in a very compact form. After the information is compacted, it's ready for indexing.

An index has a single purpose: It allows information to be found as quickly as possible. There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word. The formula is designed to evenly distribute the entries across a predetermined number of divisions. This numerical distribution is different from the distribution of words across the alphabet, and that is the key to a hash table's effectiveness.

In English, there are some letters that begin many words, while others begin fewer. You'll find, for example, that the "M" section of the dictionary is much thicker than the "X" section. This inequity means that finding a word beginning with a very "popular" letter could take much longer than finding a word that begins with a less popular one. Hashing evens out the difference, and reduces the average time it takes to find an entry. It also separates the index from the actual entry. The hash table contains the hashed number along with a pointer to the actual data, which can be sorted in whichever way allows it to be stored most efficiently. The combination of efficient indexing and effective storage makes it possible to get results quickly, even when the user creates a complicated search.

Building a Search

Searching through an index involves a user building a query and submitting it through the search engine. The query can be quite simple, a single word at minimum. Building a more complex query requires the use of Boolean operators that allow you to refine and extend the terms of the search.
The Boolean operators most often seen are:
AND - All the terms joined by "AND" must appear in the pages or documents. Some search engines substitute the operator "+" for the word AND.
OR - At least one of the terms joined by "OR" must appear in the pages or documents.
NOT - The term or terms following "NOT" must not appear in the pages or documents. Some search engines substitute the operator "-" for the word NOT.
FOLLOWED BY - One of the terms must be directly followed by the other.
NEAR - One of the terms must be within a specified number of words of the other.
Quotation Marks - The words between the quotation marks are treated as a phrase, and that phrase must be found within the document or file.

Future Search

The searches defined by Boolean operators are literal searches -- the engine looks for the words or phrases exactly as they are entered. This can be a problem when the entered words have multiple meanings. "Bed," for example, can be a place to sleep, a place where flowers are planted, the storage space of a truck or a place where fish lay their eggs. If you're interested in only one of these meanings, you might not want to see pages featuring all of the others. You can build a literal search that tries to eliminate unwanted meanings, but it's nice if the search engine itself can help out.

One of the areas of search engine research is concept-based searching. Some of this research involves using statistical analysis on pages containing the words or phrases you search for, in order to find other pages you might be interested in. Obviously, the information stored about each page is greater for a concept-based search engine, and far more processing is required for each search. Still, many groups are working to improve both results and performance of this type of search engine. Others have moved on to another area of research, called natural-language queries.

The idea behind natural-language queries is that you can type a question in the same way you would ask it to a human sitting beside you -- no need to keep track of Boolean operators or complex query structures. The most popular natural language query site today is AskJeeves.com, which parses the query for keywords that it then applies to the index of sites it has built. It only works with simple queries; but competition is heavy to develop a natural-language query engine that can accept a query of great complexity.

Written By: Curt Franklin

Source:
http://computer.howstuffworks.com/internet/basics/search-engine.htm/printable 

Categorized in Online Research

Wikipedia has begun naming links to its online encyclopaedia that have been removed from EU search results under "right to be forgotten" rules.

The deleted links include pages about European criminals, a musician and an amateur chess player.

The Wikimedia Foundation, which operates the site, said the internet was being "riddled with memory holes" as a result of such takedowns.

The action follow a European Court of Justice ruling in May.

The judges involved decided that citizens had the right to have links to "irrelevant" and outdated data erased from search engine results.

Wikipedia is publishing copies of the removal notices it has received
A fortnight ago Google briefed data regulators that it had subsequently received more than 91,000 requests covering a total of 328,000 links that applicants wanted taken down, and had approved more than 50% of those processed.

The search engine is critical of the court's decision, but has set up a page that people can use to request removals.

At a press conference in London, the Wikimedia Foundation revealed that Google had notified it of five requests involving Wikipedia that it had acted on, affecting more than 50 links to its site.

A dedicated page on Wikipedia states that they include:

  • An English-language page about Gerry Hutch, a Dublin-born businessman nicknamed "the Monk" who was jailed in the 1980s
  • A photograph of a musician, Tom Carstairs, holding a guitar
  • An Italian-language page about Banda della Comasina, the name the media gave to a group of criminals active in the 1970s
  • An Italian-language page about Renato Vallanzasca, an Italian who was jailed after involvement in kidnappings and bank robberies
  • Dozens of Dutch-language pages that mention Guido den Broeder, a chess player from the Netherland

"We only know about these removals because the involved search engine company chose to send notices to the Wikimedia Foundation," the organisation's lawyers wrote in a blog.

"Search engines have no legal obligation to send such notices. Indeed, their ability to continue to do so may be in jeopardy.

"Since search engines are not required to provide affected sites with notice, other search engines may have removed additional links from their results without our knowledge. This lack of transparent policies and procedures is only one of the many flaws in the European decision."

EU regulators have expressed concern that Google is notifying website administrators of the links it removes, suggesting this undermines the point of the law.

While the links do not appear on Google.co.uk and other versions of the search engine created for specific EU countries, they do still appear on Google.com, which can be accessed in Europe.

 

Data requests

The Wikimedia Foundation has also published its first transparency report - following a similar practice by Google, Twitter and others.

It reveals that the organisation received 304 general content removal requests between July 2012 and June 2014, none of which it complied with.

They included a takedown request from a photographer who had claimed he owned the copyright to a series of selfies taken by a monkey.

Gloucestershire-based David Slater had rotated and cropped the images featured on the site.

But the foundation rejected his claim on the grounds that the monkey had taken the photo, and was therefore the real copyright owner.

The foundation also revealed it had received 56 requests for data about its users.

It said it had complied with eight of these requests, affecting 11 accounts. All of these resulted in information being passed to US-based bodies.

"If we must produce information due to a legally valid request, we will notify the affected user before we disclose, if we are legally permitted and have the means to do so," the foundation said.

"In certain cases, we may help find assistance for users to fight an invalid request.

Source: http://www.bbc.com

Categorized in Online Research

airs logo

Association of Internet Research Specialists is the world's leading community for the Internet Research Specialist and provide a Unified Platform that delivers, Education, Training and Certification for Online Research.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.

Follow Us on Social Media

Book Your Seat for Webinar GET FREE REGISTRATION FOR MEMBERS ONLY      Register Now