Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))

Well House Consultants
You are on the site of Well House Consultants who provide Open Source Training Courses and business hotel accommodation. You are welcome to browse and use our resources subject to our copyright statement and to add in links from your pages to ours.
Other subject areas - resources
Java Resources
Well House Manor Resources
Perl Resources
Python Resources
PHP Resources
Object Orientation and General topics
MySQL Resources
Linux / LAMP / Tomcat Resources
Well House Consultants Resources
Extras Resources
C and C++ Resources
Ruby Resources
Tcl/Tk Resources
Web and Intranet Resources
Perl module P608
Robots, Crawlers and Spiders
Exercises, examples and other material relating to training module P608. This topic is presented on public courses Using Perl on the Web, Perl Extra

Although most users of the web will be seated at a browser and will call up pages one by one, there's also a requirement for automated browsing tools. For example, a search engine such as Google will methodically visit a site page by page, indexing the entire content for its customers, and a web site validation program will visit each page in turn in order to find any broken links before live users do. Perl is an excellent language for writing automated browsing tools such as these, and in this module we study the techniques you may wish to use, and also the etiquette involved in writing socially acceptable automata.


Articles and tips on this subjectupdated
2402Automated Browsing in Perl
I'm reminded on today's Perl course just how powerful some of the modules are, and how much you can do in so little code. LWP::UserAgent turns your Perl into an automated browser .. the following four lines reading the robots.txt off my web site. use LWP::UserAgent; $connex = new LWP::UserAgent("agent" ...
2009-09-11
(short)
2229Do not re-invent the wheel - use a Perl module
"If you think 'surely someone has done this before', you're probably right ... and in Perl, you'll find the resource you need available as a module on your system, or if it's not quite to common, on the CPAN". I was reminded of this advise today, when I got involved with web site checking ... and rather ...
2009-06-12
 
2045Does robots.txt actually work?
If you put an entry into your robots.txt file to ask the various robots to disallow (cease crawling) certain files and directories, do they actually take note of your request ... considering that it's a purely voluntary standard ... Three or four days back, I excluded some old map pages which were being ...
2009-02-17
 
1031robots.txt - a clue to hidden pages?
The robots.txt file is designed to provide spiders and crawlers with a list of places they should NOT go - it's described as the "robot exclusion standard" file and its intent is to allow the webmaster to segregate his site into indexable and non-indexable. But because it lists directorys to be excluded, ...
2007-01-17
 
Background information
Some modules are available for download as a sample of our material or under an Open Training Notes License for free download from [here].
Topics covered in this module
Definitions.
Cautions.
Checking a page, links and sites.
Checking a single page.
Checking links and included files.
Checking a site.
Things to do with a pet spider.
Being considerate.
The robots exclusion standard.
Bandwidth.
Complete learning
If you are looking for a complete course and not just a information on a single subject, visit our Listing and schedule page.

Well House Consultants specialise in training courses in Ruby, Lua, Python, Perl, PHP, and MySQL. We run Private Courses throughout the UK (and beyond for longer courses), and Public Courses at our training centre in Melksham, Wiltshire, England. It's surprisingly cost effective to come on our public courses - even if you live in a different country or continent to us.

We have a technical library of over 700 books on the subjects on which we teach. These books are available for reference at our training centre.


You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/resources/P608.html • PAGE BUILT: Sun Oct 11 14:50:09 2020 • BUILD SYSTEM: JelliaJamb