Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
 
Checking robots.txt from Python

The robots.txt file - which well behaved automata check to see whether they are welcome on a web site - has two directives in its base specification ' User-Agent and DisAllow. You will find some other directives used, and you will find some sites who have a robots.txt file that has blank lines after the User-Agent line, even though (in the specification) the block for a user agent ends at a blank line. These rules, and web master's lack of knowledge of the detail, mean that some sites don't have their robots exclusion file as effective as they would wish.

I have written a very short Python example here which reads a robots.txt file via http protocol, and analyses it to report on the active User-Agent and Disallow lines - not only as a sample program on today's Python Course, but also to allow me to do a quick sanity check of robots.txt files.

Features of this Python example include ...
• Checking the number of command line parameters
• Connecting to a remote web resource and reading it as it it was a file
• Use of exceptions
(written 2009-07-12)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
W501 - Introduction to Web Site Structure
  [2552] Web site traffic - real users, or just noise? - (2009-12-26)
  [2214] Global Index to help you find resources - (2009-06-01)
  [2094] If you have a spelling mistake in your URL / page name - (2009-03-21)
  [1969] Search Engines. Getting the right pages seen. - (2009-01-01)
  [1686] FTP - how not to corrupt data (binary v ascii) - (2008-06-24)
  [1636] What to do if the Home Page is missing - (2008-05-08)
  [1431] Getting the community on line - some basics - (2007-11-13)
  [1198] From Web to Web 2 - (2007-05-21)
  [1176] A pu that got me into trouble - (2007-05-04)
  [1168] Moving out some of the web site bloat - (2007-04-29)
  [1031] robots.txt - a clue to hidden pages? - (2007-01-13)
  [1024] Web site - a refresh to improve navigation - (2007-01-07)
  [528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
  [332] Looking up IP addresses - (2005-06-01)

W603 - Web and Intranet - Server Side Technologies
  [4277] Sending a message to the server and changing text on a page when a button is pressed - (2014-05-23)
  [3915] How does PHP work? - (2012-11-07)
  [3705] Django Training Courses - UK - (2012-04-23)
  [2055] Effect on server when memory runs out and swapping starts - (2009-02-26)
  [1749] Using server side and client side programming together - (2008-08-11)
  [1615] PHP training courses every month - (2008-04-18)
  [1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
  [1365] Korn Shell scripts on the web - (2007-09-25)
  [1355] .php or .html extension? Morally Static Pages - (2007-09-17)
  [1020] Parallel processing in PHP - (2007-01-03)
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [653] Easy feed! - (2006-03-21)
  [642] How similar are two words - (2006-03-11)

Y110 - Python - File Handling
  [4717] with in Python - examples of use, and of defining your own context - (2016-11-02)
  [4708] Scons - a build system in Python - building hello world - (2016-10-29)
  [4663] Easy data to object mapping (csv and Python) - (2016-03-24)
  [4593] Command line parameter handling in Python via the argparse module - (2015-12-08)
  [4451] Running an operating system command from your Python program - the new way with the subprocess module - (2015-03-06)
  [4438] Loving programming in Python - and ready to teach YOU how - (2015-02-22)
  [3764] Shell, Awk, Perl of Python? - (2012-06-14)
  [3558] Python or Lua - which should I use / learn? - (2011-12-21)
  [3465] How can I do an FTP transfer in Python? - (2011-10-05)
  [3442] A demonstration of how many Python facilities work together - (2011-09-16)
  [3083] Python - fresh examples from recent courses - (2010-12-11)
  [2870] Old prices - what would the equivalent price have been in 1966? - (2010-07-14)
  [2011] Conversion of OSI grid references to Eastings and Northings - (2009-01-28)
  [1442] Reading a file multiple times - file pointers - (2007-11-23)
  [183] The elegance of Python - (2005-01-19)
  [114] Relative or absolute milkman - (2004-11-10)


Back to
Python - using exceptions to set a fallback
Previous and next
or
Horse's mouth home
Forward to
Everyone is in the customer relations business
Some other Articles
New to programming? It is natural (but needless) for you to be nervous
Great new diagrams for our notes ... Python releases
Strings as collections in Python
Everyone is in the customer relations business
Checking robots.txt from Python
Python - using exceptions to set a fallback
Creating and iterating through Python lists
Understanding the new local government structure in Wiltshire
First courses for 2010
Python classes / courses - what version do we train on?
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2019: 404 The Spa • Melksham, Wiltshire • United Kingdom • SN12 6QL
PH: 01225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/2282_Che ... ython.html • PAGE BUILT: Sat May 27 16:49:10 2017 • BUILD SYSTEM: WomanWithCat