Good article! Here's an example of a typical robots.txt file I may use:
This robots.txt file tells Google and MSN not to index certain (e.g. image) files and limits the frequency of hits of the spiders slurp and msnbot (otherwise they can eat up bandwidth).User-agent: Mediapartners-Google*
Disallow:
User-agent: Googlebot
Disallow: /*.doc$
Disallow: /*.PDF$
Disallow: /*.jpeg$
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.gif$
Disallow: /*.exe$
Disallow: /*.mp3$
Disallow: /*.mid$
Disallow: /*.wav$
User-Agent: msnbot
Disallow: *.doc$
Disallow: *.PDF$
Disallow: *.jpeg$
Disallow: *.jpg$
Disallow: *.png$
Disallow: *.gif$
Disallow: *.exe$
Disallow: *.mp3$
Disallow: *.mid$
Disallow: *.wav$
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /guardian/
Disallow: /axs/
Disallow: /admin/
User-agent: Slurp
Crawl-delay: 60
User-Agent: msnbot
Crawl-delay: 60
This is just an example. Each website is different.
Here's one robots.txt file validator you can use: http://www.searchengineworld.com/cgi-bin/robotcheck.cgi







Reply With Quote

Bookmarks