Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Checking robots.txt from Python

The robots.txt file - which well behaved automata check to see whether they are welcome on a web site - has two directives in its base specification ' User-Agent and DisAllow. You will find some other directives used, and you will find some sites who have a robots.txt file that has blank lines after the User-Agent line, even though (in the specification) the block for a user agent ends at a blank line. These rules, and web master's lack of knowledge of the detail, mean that some sites don't have their robots exclusion file as effective as they would wish.

I have written a very short Python example here which reads a robots.txt file via http protocol, and analyses it to report on the active User-Agent and Disallow lines - not only as a sample program on today's Python Course, but also to allow me to do a quick sanity check of robots.txt files.

Features of this Python example include ...
• Checking the number of command line parameters
• Connecting to a remote web resource and reading it as it it was a file
• Use of exceptions
(written 2009-07-12)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
Y110 - Python - File Handling
  [114] Relative or absolute milkman - (2004-11-10)
  [183] The elegance of Python - (2005-01-19)
  [1442] Reading a file multiple times - file pointers - (2007-11-23)
  [2011] Conversion of OSI grid references to Eastings and Northings - (2009-01-28)
  [2870] Old prices - what would the equivalent price have been in 1966? - (2010-07-14)
  [3083] Python - fresh examples from recent courses - (2010-12-11)
  [3442] A demonstration of how many Python facilities work together - (2011-09-16)
  [3465] How can I do an FTP transfer in Python? - (2011-10-05)
  [3558] Python or Lua - which should I use / learn? - (2011-12-21)
  [3764] Shell, Awk, Perl of Python? - (2012-06-14)
  [4438] Loving programming in Python - and ready to teach YOU how - (2015-02-22)
  [4451] Running an operating system command from your Python program - the new way with the subprocess module - (2015-03-06)
  [4593] Command line parameter handling in Python via the argparse module - (2015-12-08)
  [4663] Easy data to object mapping (csv and Python) - (2016-03-24)
  [4708] Scons - a build system in Python - building hello world - (2016-10-29)
  [4717] with in Python - examples of use, and of defining your own context - (2016-11-02)

W603 - Web and Intranet - Server Side Technologies
  [642] How similar are two words - (2006-03-11)
  [653] Easy feed! - (2006-03-21)
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [1020] Parallel processing in PHP - (2007-01-03)
  [1031] robots.txt - a clue to hidden pages? - (2007-01-13)
  [1355] .php or .html extension? Morally Static Pages - (2007-09-17)
  [1365] Korn Shell scripts on the web - (2007-09-25)
  [1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
  [1615] PHP training courses every month - (2008-04-18)
  [1749] Using server side and client side programming together - (2008-08-11)
  [2055] Effect on server when memory runs out and swapping starts - (2009-02-26)
  [3705] Django Training Courses - UK - (2012-04-23)
  [3915] How does PHP work? - (2012-11-07)
  [4277] Sending a message to the server and changing text on a page when a button is pressed - (2014-05-23)

W501 - Introduction to Web Site Structure
  [332] Looking up IP addresses - (2005-06-01)
  [528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
  [1024] Web site - a refresh to improve navigation - (2007-01-07)
  [1168] Moving out some of the web site bloat - (2007-04-29)
  [1176] A pu that got me into trouble - (2007-05-04)
  [1198] From Web to Web 2 - (2007-05-21)
  [1431] Getting the community on line - some basics - (2007-11-13)
  [1636] What to do if the Home Page is missing - (2008-05-08)
  [1686] FTP - how not to corrupt data (binary v ascii) - (2008-06-24)
  [1969] Search Engines. Getting the right pages seen. - (2009-01-01)
  [2094] If you have a spelling mistake in your URL / page name - (2009-03-21)
  [2214] Global Index to help you find resources - (2009-06-01)
  [2552] Web site traffic - real users, or just noise? - (2009-12-26)


Back to
Python - using exceptions to set a fallback
Previous and next
or
Horse's mouth home
Forward to
Everyone is in the customer relations business
Some other Articles
New to programming? It is natural (but needless) for you to be nervous
Great new diagrams for our notes ... Python releases
Strings as collections in Python
Everyone is in the customer relations business
Checking robots.txt from Python
Python - using exceptions to set a fallback
Creating and iterating through Python lists
Understanding the new local government structure in Wiltshire
First courses for 2010
Python classes / courses - what version do we train on?
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/2282_Che ... ython.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb