Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
Telling robots to bypass certain URLs
Introduction to Web Site Structure example from a Well House Consultants training course
More on Introduction to Web Site Structure [link]

This example is described in the following article(s):
   • Search Engines. Getting the right pages seen. - [link]
   • Web Site Loading - experiences and some solutions shared - [link]
   • Automating access to a page obscured behind a holding page - [link]
   • Setting your user_agent in PHP - telling back servers who you are - [link]

This example references the following resources:
http://en.wikipedia.org/wiki/Robots.txt
http://www.robotstxt.org/robotstxt.html
http://tool.motoricerca.info/robots-checker.phtml

Source code: robots.txt Module: W501
#
# This sample file is written to the "robots exclusion protocol"
# or "robots exclusion standard". Well behaved robots (that's
# all the important ones!) use this file to check where they are
# unwelcome ... and they should then only crawl / use your other
# pages.
#
# robots.txt file for www.wellho.net and www.wellho.co.uk
# See
# http://en.wikipedia.org/wiki/Robots.txt
# http://www.robotstxt.org/robotstxt.html
# and checker at
# http://tool.motoricerca.info/robots-checker.phtml
#
# Why do you want to exclude certain URLs when the whole point
# of having a web site is to give the public access to the
# informaion it contains? You'll see in my example that I've
# put a note beside each of the URLs listed.
#
# * I do NOT want search results within our site indexed, as they
# would just hide the real pages
#
# * There is no point in the search engines trying to index all
# possibly accessibility combinations
#
# * CGI program outputs differ every time - no point in indexing them
#
# * The "happens" directory is our staff short cuts - not really a place
# for new visitors to land!
#
# * The unique.html file is automatically generated from all the other
# pages on our site and contains a list of possible spelling mistakes on
# other pages - NOT what we want to index under!

User-agent: *
Disallow: /cgi-bin/ # Disallow cgi programs
Disallow: /net/unique.html # Unique words
Disallow: /happens/ # Our Staff Short Cuts
Disallow: /resources/mywellho.html # Accessibility Options
Disallow: /net/search.php4 # Searches

# The following block excludes a robot that we do NOT want to crawl
# our site (it grabs lots of pages in a short time, and it is motivated
# not by helping us but purely by the gains it can make for others
# according to their FAQ!) ... but it DOES respect the robots.txt standard

User-agent: TurnitinBot
Disallow: /

# Note that blank lines are NOT allowed within the block!
Learn about this subject
This module and example are covered on our public %xw% course. If you have a group of three or more trainees who need to learn the subject, we can also arrange a private or on site course for you.

Books covering this topic
Yes. We have over 700 books in our library. Books covering are listed here and when you've selected a relevant book we'll link you on to Amazon to order.

Other Examples
This example comes from our "Introduction to Web Site Structure" training module. You'll find a description of the topic and some other closely related examples on the "Introduction to Web Site Structure" module index page.

Full description of the source code
You can learn more about this example on the training courses listed on this page, on which you'll be given a full set of training notes.

Many other training modules are available for download (for limited use) from our download centre under an Open Training Notes License.

Other resources
• Our Solutions centre provides a number of longer technical articles.
• Our Opentalk forum archive provides a question and answer centre.
The Horse's mouth provides a daily tip or thought.
• Further resources are available via the resources centre.
• All of these resources can be searched through through our search engine
• And there's a global index here.

Purpose of this website
This is a sample program, class demonstration or answer from a training course. It's main purpose is to provide an after-course service to customers who have attended our public private or on site courses, but the examples are made generally available under conditions described below.

Web site author
This web site is written and maintained by Well House Consultants.

Conditions of use
Past attendees on our training courses are welcome to use individual examples in the course of their programming, but must check the examples they use to ensure that they are suitable for their job. Remember that some of our examples show you how not to do things - check in your notes. Well House Consultants take no responsibility for the suitability of these example programs to customer's needs.

This program is copyright Well House Consultants Ltd. You are forbidden from using it for running your own training courses without our prior written permission. See our page on courseware provision for more details.

Any of our images within this code may NOT be reused on a public URL without our prior permission. For Bona Fide personal use, we will often grant you permission provided that you provide a link back. Commercial use on a website will incur a license fee for each image used - details on request.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2021: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/resources/ex.php4 • PAGE BUILT: Sun Oct 11 14:50:09 2020 • BUILD SYSTEM: JelliaJamb