Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
20.9.2014 - We have just updated our course layouts and descriptions and added our 2015 schedule.


Well House Consultants
You are on the site of Well House Consultants who provide Open Source Training Courses and business hotel accommodation. You are welcome to browse and use our resources subject to our copyright statement and to add in links from your pages to ours.
Other subject areas - resources
Java Resources
Well House Manor Resources
Perl Resources
Python Resources
PHP Resources
Object Orientation and General topics
MySQL Resources
Linux / LAMP / Tomcat Resources
Well House Consultants Resources
Extras Resources
C and C++ Resources
Ruby Resources
Tcl/Tk Resources
Web and Intranet Resources
Perl module P608
Robots, Crawlers and Spiders
Exercises, examples and other material relating to training module P608. This topic is presented on public courses Using Perl on the Web, Perl Extra

Although most users of the web will be seated at a browser and will call up pages one by one, there's also a requirement for automated browsing tools. For example, a search engine such as Google will methodically visit a site page by page, indexing the entire content for its customers, and a web site validation program will visit each page in turn in order to find any broken links before live users do. Perl is an excellent language for writing automated browsing tools such as these, and in this module we study the techniques you may wish to use, and also the etiquette involved in writing socially acceptable automata.


Articles and tips on this subjectupdated
2402Automated Browsing in Perl
I'm reminded on today's Perl course just how powerful some of the modules are, and how much you can do in so little code. LWP::UserAgent turns your Perl into an automated browser .. the following four lines reading the robots.txt off my web site. use LWP::UserAgent; $connex = new LWP::UserAgent("agent" ...
2009-09-11
(short)
2229Do not re-invent the wheel - use a Perl module
"If you think 'surely someone has done this before', you're probably right ... and in Perl, you'll find the resource you need available as a module on your system, or if it's not quite to common, on the CPAN". I was reminded of this advise today, when I got involved with web site checking ... and rather ...
2009-06-12
 
2045Does robots.txt actually work?
If you put an entry into your robots.txt file to ask the various robots to disallow (cease crawling) certain files and directories, do they actually take note of your request ... considering that it's a purely voluntary standard ... Three or four days back, I excluded some old map pages which were being ...
2009-02-17
 
1031robots.txt - a clue to hidden pages?
The robots.txt file is designed to provide spiders and crawlers with a list of places they should NOT go - it's described as the "robot exclusion standard" file and its intent is to allow the webmaster to segregate his site into indexable and non-indexable. But because it lists directorys to be excluded, ...
2007-01-17
 
Background information
Some modules are available for download as a sample of our material or under an Open Training Notes License for free download from http://www.training-notes.co.uk.
Topics covered in this module
Definitions.
Cautions.
Checking a page, links and sites.
Checking a single page.
Checking links and included files.
Checking a site.
Things to do with a pet spider.
Being considerate.
The robots exclusion standard.
Bandwidth.
Complete learning
If you are looking for a complete course and not just a information on a single subject, visit our Listing and schedule page.

Well House Consultants specialise in training courses in Python, Perl, PHP, and MySQL. We run Private Courses throughout the UK (and beyond for longer courses), and Public Courses at our training centre in Melksham, Wiltshire, England. It's surprisingly cost effective to come on our public courses - even if you live in a different country or continent to us.

We have a technical library of over 700 books on the subjects on which we teach. These books are available for reference at our training centre. Also available is the Opentalk Forum for discussion of technical questions.


You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/resources/P608.html • PAGE BUILT: Thu Sep 18 11:03:17 2014 • BUILD SYSTEM: WomanWithCat