FILENAME=174425-stop-search-engines.php Digital Security Reports: Battle The (google) Bots
 Affiliate Marketing Web Traffic Marketing ArticlesForumNiche Marketing
Build An eBusiness On A Shoestring

 

Free eBooks and Memberhip to Viral Explosion

Click Here

Archives

More articles by Dan Farrell 

Add to My Yahoo!

This is the best information I've read in a ezine. I subscribe to over 25!! And I read them all too. I'm a speed reader - thank goodness. I think you are on the right track and I appreciate the tips suggestions, and information you are sharing. Please keep it coming." Janice McAlister, MI

Subscribe and get List Building Interview with Liz Tomey and Jeff Dedrick

 

 



Marketing Articles

Digital Security Reports: Battle The (google) Bots

by Nick Dalton.

Back before there was Google, the big new search engine out there was AltaVista. In an effort to show off its power, the AltaVista team from Digital decided to crawl and index the entire web, which was a new concept at the time. There were many who didn't like the idea of a "robot" program accessing every page on their web sites because it would cause more load time to their web servers and increase bandwidth costs for them. To address their growing concerns, in 1996 the Robots Exclusion Standard was created.

Using a simple text file called robots.txt you can instruct search engines to stay out of certain directories. Here is a very simple robots.txt which disallows all search engines (User-agents) access to the /images directory.

User-agent: * Disallow: /images

By disallowing /images you are also implicitly disallowing all subdirectories under /images, such as /images/logos and any files beginning with /images such as /images.html.

The first draft of the standard did not include an "Allow" directive. It was added later, but there is no guarantee it's supported by all search engines. Anything that was set to be specifically disallowed was considered fair game to web crawlers.

If you choose to disallow access to your entire web site, you can use a robots.txt like this:

User-agent: * Disallow: /

The next lines apply to every search robot when the User-agent is *. Through the specification of the signature of a web crawler as User-agent specific instructions can be given to such a search robot.

User-agent: Googlebot Disallow: /google-secrets

Since the initial specification was issued, some search engines have expanded the protocol. An example of this is to permit the use of wildcards.

User-agent: Slurp Disallow: /*.gif$

As a result, Yahoo!'s web crawler (named Slurp) cannot index files on your site if they end in .gif. You do need to preface these lines with the requisite user-agent line, since not every search engine presently supports wildcard matches.

You can merge a number of these practices into one robots.txt file. To illustrate that theory, here is an instance.

User-agent: * Disallow: /bar User-agent: Googlebot Allow: /foo Disallow: /bar Disallow: /*.gif$ Disallow: /

Computer programs are pretty good at following instructions like these. But for a human brain it can quickly get overwhelming, so I highly encourage you to keep it simple.

Google's webmaster tools includes a robots.txt analysis tool that is very highly recommended. For more information on the Robots Exclusion Standard, point your browser to www.robotstxt.org.

Today when companies are spending a lot of money to be included in search engine listings, the idea of excluding your content may seem quaint. But from a security perspective there are many valid reasons for limiting what a search engine indexes on your site. See my Digital Security Report for more information.

Nick Dalton's blog is TipsTricksToolsTechniques.com where he regularly shares tips on Internet security. Also worth checking out is his latest report called The Digital Security Report it has essential advice for Internet business owners selling products online.

Published November 8th, 2007

Filed in Ecommerce, Search Engine, Web Design

     Affiliate Marketing     Site Map     Related Resources 
Copyright © 2004 Affiliate Marketing