How To Get More Traffic with Robots and Meta Tags

Share  submit to reddit    


robot text file How To Get More Traffic with Robots and Meta Tags

A robots.txt file is the gatekeeper for your web site or blog that either allows or disallows search engine spiders from indexing your web site. If you don’t have a robots.txt file the spiders and bots make an assumption about how they should access your web site. We want to send clear instructions as to what we want spiders and bots to do.

Assuming you want maximum traffic to your web site you’ll want to allow all spiders and bots access to crawl and index everything on your web site. Sometimes the names of spiders and bots change, so it’s a good idea to allow them all. Some instructions will have a bunch of lines for a robots.txt file listing all the spiders and bots they know of, but again we just want to cover all of them without having to keep modifying the robots.txt file in the future.

Part 1


Create a text file called robots.txt and paste the following lines into the file:

User-agent: *
Disallow:
Sitemap: http://www.your-domain-name.com/sitemap.xml

That’s it if you want maximum traffic!

The first line "User-agent: *" applies to any and all spiders and bots.

To block certain spiders you would add a line like:

User-agent: badbot

The second line "Disallow: " means no directories are off limits. If your web site, or blog, already has certain directories password protected then you don’t need to worry about limiting the spiders or bots.

If you did want to limit access to certain directories you would not leave "Disallow: " blank, but would instead add one new line for every directory you want to block access to like this:

Disallow: /cgi-bin/
Disallow: /protected/

The last line is "Sitemap." This is new edition to the robots.txt file that was approved by Google, Yahoo, Microsoft, and others to allow them to easily find your sitemap file. It doesn’t have to be named xml, but it does have to in the xml format.

When blocking bots you want those listed first. Spiders and bots read the robots.txt file from top to bottom, so you might disallow one bot while allowing all others like this:

User-agent: badbot
Disallow: /cgi-bin/
Disallow: /protected/

User-agent: *
Disallow:
Sitemap: http://www.your-domain-name.com/sitemap.xml

In Google’s webmaster directions it mentions using:

Allow: /

This line is not understood by all bots and spiders, so it is not backward compatible. Google’s bot does understand "Disallow: ", so we’ll just leave out "Allow: / ", which would allow Google’s bot to read all files and directories starting at the root, but might actually stop another spider or bot from indexing the web site since it wouldn’t recognize the "Allow: /" command.

Part 2


Adding a meta tag is just for good measure. Bad bots can just ignore the meta tag, but good bots will following the directions you give. To insure your site is completely indexed for maximum traffic add these two lines in the header:

<meta name="googlebot" content="index, follow" />
<meta name="robots" content="index, follow" />

If you leave these lines out the spiders and bots assume the web site is set for "index, follow."

You can use the following combinations to direct bots on certain pages to not index and not follow, but again, bad bots will ignore the meta:

<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

The meta tags must be positioned as shown in the following example:

<html>
<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</head>

Be sure to setup the robots.txt file above, and add the meta lines, if you want your site fully indexed for maximum traffic.


Related Posts:


Posted on September 8, 2009 at 3:59 pm(PST)
Tags: | | | | |


Leave a Reply

© Copyright Nerd Grind 2009 - 2010. All rights reserved.