Monday, July 4, 2011

Using the Robots Meta Tag

Using the robots meta tag you can exclude any HTMLbased content from a web site on a page-by-page basis, and it is frequently an easier method to use when eliminating duplicate content from a preexisting site for which the source code is available, or when a site contains many complex dynamic URLs.

To exclude a page with meta-exclusion, simply place the following code in the < head > section of the
HTML document you want to exclude:
<meta name=”robots” content=”noindex, nofollow” />
This indicates that the page should not be indexed (noindex) and none of the links on the page should
be followed (nofollow). It is relatively easy to apply some simple programming logic to decide whether
or not to include such a meta tag on the pages of your site. It will always be applicable, so long as you
have access to the source code of the application, whereas robots.txt exclusion may be difficult or
even impossible to apply in certain cases.

To exclude a specific spider, change “robots” to the name of the spider — for example googlebot,
msnbot, or slurp. To exclude multiple spiders, you can use multiple meta tags. For example, to
exclude googlebot and msnbot:

<meta name=”googlebot” content=”noindex, nofollow” />
<meta name=”msnbot” content=”noindex, nofollow” />

The only downside is that the page must be fetched in order to determine that it should not be indexed in the first place. This is likely to slow down indexing.

No comments:

Post a Comment