If you put an entry into your robots.txt file to ask the various robots to disallow (cease crawling) certain files and directories, do they actually take note of your request ... considering that it's a purely voluntary standard ...
Three or four days back, I excluded some old map pages which were being heavily crawled and I've just visited my log files for the last fortnight:
-bash-3.2$ egrep -c 'net/+map' ac_200902*
ac_20090201:8779
ac_20090202:7884
ac_20090203:15697
ac_20090204:9284
ac_20090205:4944
ac_20090206:9640
ac_20090207:10299
ac_20090208:7015
ac_20090209:5534
ac_20090210:4188
ac_20090211:6808
ac_20090212:853
ac_20090213:1669
ac_20090214:74
ac_20090215:76
Yes! - it has worked. Accesses to these pages - which were predominantly crawlers - has dropped from some 8,000 to 10,000 per day down to less than a hundred - and I suspect that most of those are genuine hits!
You'll find more about robots.txt
here (written 2009-02-16, updated 2009-02-17)
Associated topics are indexed under
P608 - Perl - Robots, Crawlers and Spiders [2402] Automated Browsing in Perl - (2009-09-11)
[2229] Do not re-invent the wheel - use a Perl module - (2009-06-11)
[1031] robots.txt - a clue to hidden pages? - (2007-01-13)
G911 - Well House Consultants - Search Engine Optimisation [2748] Monitoring the success and traffic of your web site - (2010-05-01)
[2686] Freedom of Information - consideration for web site designers - (2010-03-20)
[2562] Tuning the web site for sailing on through this year - (2010-01-03)
[2552] Web site traffic - real users, or just noise? - (2009-12-26)
[2428] Diluting History - (2009-09-27)
[2330] Update - Automatic feeds to Twitter - (2009-08-09)
[2324] What search terms FAIL to bring visitors to our site, when they should? - (2009-08-05)
[2137] Reaching the right people with your web site - (2009-04-23)
[2107] How to tweet automatically from a blog - (2009-03-28)
[2106] Learning to Twitter / what is Twitter? - (2009-03-28)
[2065] Static mirroring through HTTrack, wget and others - (2009-03-03)
[2019] Baby Caleb and Fortune City in your web logs? - (2009-01-31)
[2000] 2000th article - Remember the background and basics - (2009-01-18)
[1984] Site24x7 prowls uninvited - (2009-01-10)
[1982] Cooking bodies and URLs - (2009-01-08)
[1971] Telling Google which country your business trades in - (2009-01-02)
[1969] Search Engines. Getting the right pages seen. - (2009-01-01)
[1793] Which country does a search engine think you are located in? - (2008-09-11)
[1344] Catching up on indexing our resources - (2007-09-10)
[1029] Our search engine placement is dropping. - (2007-01-11)
[1015] Search engine placement - long term strategy and success - (2006-12-30)
[427] The Melksham train - a button is pushed - (2005-08-28)
[165] Implementing an effective site search engine - (2005-01-01)
Some other Articles
Why Choose Well House Consultants for your course?Learning to program in PHP, Python, Java or Lua ...Small Web Server in PerlFinding variations on a surnameDoes robots.txt actually work?Please Trouble meConfidence, Customer Service and Tourism in MelkshamWiltshire Rail Service Updatehttpd, Tomcat and PHP course enhancementsError: Cant read xxxxx: no such variable (in Tcl Tk)