In your Robots.txt just disallow:
http://www.mysite.com/index.php?url=*
If you are new to domains and looking to buy, sell and learn about domains then you have come to the right place. DNForum is the largest domain name community on the internet and continues to grow every day. There are over 105,000 domainers on DNForum doing everything from buying domains, selling domains, learning about domains and discussing domains. Take a minute and Register.
Register Today on DNForum IT'S FREE!hi, was wondering if there was a way to stop spiders picking up urls such as:
www.mysite.com/index.php?url=fdsf.sdfer,sader
http://www.mysite.com/index.php?url=jjj.tbbtyr.pbz
(billions of combinations of /index.php?url=*code* so blocking each individually isnt a realistic option)
yet i still want the spiders to crawl www.mysite.com/index.php
anyone have any surgestions or ideas that would work?
thanks,
steve
i'll give 1,000 DNF$ to anyone who can solve this problem
http://www.goodridgeelec.com
Electrical Contractors, West Midlands, UK
In your Robots.txt just disallow:
http://www.mysite.com/index.php?url=*
thanks for the help![]()
http://www.goodridgeelec.com
Electrical Contractors, West Midlands, UK
Np thanks for payment.
I'm not too sure your suggested robots syntax is correct. I did a test robots.txt file as per your suggestion and ran it through an online validator and this is what I got:
The Disallow field has an inherent wildcard nature. The standard dictates that /bob would disallow /bob.html and /bob/index.html (both the file bob and files in the bob directory will not be indexed). Another example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.htmlDisallow: http://www.mysite.com/index.php?url=*
The "*" wildchar in file names is not supported by (all) the user-agents addressed by this block of code. You should use the wildchar "*" in a block of code exclusively addressed to spiders that support the wildchar (Eg. Googlebot).
You can't use an absolute URL. Please remove the "http://" and the domain name and insert just a file/directory full path, starting from the root directory (Example: /pagename.html).
So perhaps your robots.txt file should rather just read:
References:User-agent: *
Disallow: /index.php?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
My Current Websites for SALE
Bookmarks