Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
 
Static mirroring through HTTrack, wget and others

Our web site is not best suited to off-line browsing these days - it may be flexible, but if you want to take a copy of it, but it onto a CD, then browse away from the Internet, please resist the temptation. Why is it NOT a good idea to 'blind mirror' us?

1. The Changing nature of our web site. Our pages are adaptive; if you browse from Aberdeen, you'll be offered pulldown menus asking if you're in Aberdeen, Inverness, Dundee, Perth, but if you're browsing from Bristol, you'll be offered Bristol, Bath, Newport, Taunton. If you're browsing with Internet Explorer, some adoption of the HTML will be made to accommodate non-standard features. Your previous visit history will be noted and you'll have different options highlighted as our page is presented in a way to help you navigate. None of these features can work from a mirror CD!

2. Our size. We've got around 15,000 different URLs on this web site ... pages ranging from pictures of Gosport Station to using Utility methods to construct objects of different type in Python, and it's unlikely that you'll want them all - so mirroring is a very slow and very blunt tool which hurts ...

3. Our bandwidth. It's a serious resource hog if you try to copy all of our pages. You're costing us a lot of bandwidth, you're slowing down others who are trying to use our site - basically, you're being antisocial (though probably not intentionally so!). And do you know the worst of it ...

4. Out of date. Your mirror copy will rapidly go out of date, as this is a dynamic site where new examples are added, links updated, and comments amended somewhere all the time. Having spent a lot of time creating a traffic jam, you'll find that the destination really wasn't worth going to.

5. Copyright issues. I am also concerned about our copyright issues; I appreciate that duplicating content is easy, but I would much rather provide a feed to people as they need pages than have - as I have found in the past - mirrored pages that have out-of-date or unaltered absolute links, and are said to be in our name - they get us a bad reputation when really they are an imitation, and ought to be the sincerest form of flattery.

If you're thinking of mirroring us ... please don't do it ... and if you have found this page unexpectedly ... our web site probably thinks that you are trying to mirror it, and is asking you not to do so!

How do we detect mirroring operations?

There are certain programs that do it, and we look for things like wget (link) and HTTrack (link) in our User Agent requests / logs. Such signals aren't going to find the people who try to hide what they're doing, but we have other flags that may find them. This is something we discuss on courses such as Linux Web Server which helps you with your httpd deployment.

How should you as a webmaster handle such bulk download requests?

First things first - work out what you want to do. Do you want to allow mirrors, allow part of the site to be mirrored, rudely lock and bolt the front door against mirroring, or hang up a polite sign that says 'please do not mirror'. And if you go for the latter, how do you get your mirrorers to actually read the sign?

If you've decided to restrict your users from mirroring, have a look at robots.txt, and have a look too at the environment variables that are set by the user agent and their use in conjunction with either deny directives or RewriteCond directives. And if you have common include files, you can put some database recording and monitoring in there to pick up unusual traffic flows that are the characteristic of mirroring attempts on larger sites ... all of which make very long subjects for a blog, but for excellent lunchtime discussions on a PHP techniques Workshop!
(written 2009-03-03, updated 2009-03-04)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
G902 - Well House Consultants - Web site techniques, utility and visibility
  [4492] Almost so wrong, but perhaps it's right for some? - (2015-05-11)
  [4474] Effect on external factors on traffic to our web sites - an update - (2015-04-26)
  [4401] Selecting RECENT and POPULAR news and trends for your web site users - (2015-01-19)
  [4376] Well House Consultants, Well House Manor, First Great Western Coffee shop, TransWilts / 2014 web site reports - (2015-01-01)
  [4239] Facebook marketing - early experiences - (2014-01-19)
  [4136] How do I post automatically from a PHP script to my Twitter account? - (2013-07-10)
  [4115] More or less back - what happened to our server the other day - (2013-06-14)
  [4076] Web site - fully back! - (2013-04-29)
  [4001] Helping search engines with appropriate 400 error codes - (2013-02-11)
  [3974] TV show appearance - how does it effect your web site? - (2013-01-13)
  [3896] An email marathon - (2012-10-15)
  [3776] Some traps it's so easy to fall into in designing your web site - (2012-06-23)
  [3745] Legal change - You need to obtain user consent if you use cookies on your website - (2012-06-01)
  [3744] Short Web Addresses for Melksham - (2012-05-30)
  [3734] QR codes with marketing logos embedded - (2012-05-16)
  [3623] Some TestWise examples - helping use Ruby code to check your web site operation - (2012-02-24)
  [3589] Promoting a single one of your domains on the search engines - (2012-01-22)
  [3563] How big is a web page these days? Does the size of your pages matter? - (2011-12-26)
  [3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
  [3532] Sharing the user experience - designing a form with the customer in mind - (2011-11-29)
  [3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
  [3426] Automed web site testing scripted in Ruby using watir-webdriver - (2011-09-09)
  [3367] Google +1 - what is it? - (2011-07-22)
  [3197] Finding and diverting image requests from rogue domains - (2011-03-08)
  [3149] Looking back at www.wellho.net - (2011-01-28)
  [3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
  [3022] Retaining web site visitors - reducing the one page wonders - (2010-10-31)
  [2981] How to set up short and meaningfull alternative URLs - (2010-10-02)
  [2668] Is it worth it? - (2010-03-09)
  [2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
  [2552] Web site traffic - real users, or just noise? - (2009-12-26)
  [2532] Analysing Google arrivals by country of origin - (2009-12-10)
  [2519] Status Page / breaks of service in early December - (2009-11-30)
  [2410] Removal of technical resources from this site - (2009-09-19)
  [2389] Writing with our customers words - (2009-09-01)
  [2341] Koulutus, Open Source tietokone kielillä - (2009-08-09)
  [2340] ldning, Open Source dator språk - (2009-08-09)
  [2339] Opplæring, Open Source datamaskinen språk - (2009-08-09)
  [2338] Uddannelse, Open Source computer sprog - (2009-08-09)
  [2337] Opleiding, Open Source computertalen - (2009-08-09)
  [2336] Formação, Open Source computador línguas - (2009-08-09)
  [2335] Ausbildung, die Open-Source-Sprachen - (2009-08-09)
  [2334] Formazione, Open Source computer lingue - (2009-08-09)
  [2333] Formación, de los lenguajes de código abierto - (2009-08-09)
  [2332] Formation, des langages Open Source - (2009-08-09)
  [2225] How important is a front page ranking on a search engine? - (2009-06-09)
  [2056] Web Site Loading - experiences and some solutions shared - (2009-02-26)
  [1982] Cooking bodies and URLs - (2009-01-08)
  [1970] Plagarism - who is copying my pages? - (2009-01-02)
  [1961] Making our things easier to find - (2008-12-26)
  [1955] How to avoid duplicating web page maintainance - (2008-12-20)
  [1888] Find the link - (2008-11-16)
  [1856] A few of my favourite things - (2008-10-26)
  [1833] Web Bloopers - good form design - avoiding pitfalls - (2008-10-11)
  [1797] I have been working hard but I do not expect you noticed - (2008-09-14)
  [1793] Which country does a search engine think you are located in? - (2008-09-11)
  [1756] Ever had One of THOSE mornings? - (2008-08-16)
  [1747] Who is watching you? - (2008-08-10)
  [1711] Rapid growth leads to server move - (2008-07-17)
  [1653] How do Google Ads work? - (2008-05-25)
  [1634] Kiss and Book - (2008-05-07)
  [1630] To provide external links, or not? - (2008-05-04)
  [1610] PHP course dot co, dot uk - (2008-04-13)
  [1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
  [1541] Colour, Composition or Content - (2008-02-16)
  [1534] Where in the world / country is my visitor from? - (2008-02-07)
  [1513] Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20)
  [1506] Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14)
  [1505] Script to present commonly used images - PHP - (2008-01-13)
  [1494] A time to update pictures - (2008-01-03)
  [1437] Above the fold with First Great Western - (2007-11-19)
  [1297] Stuffing content into a web page - easy maintainance - (2007-08-09)
  [1237] What proportion of our web traffic is robots? - (2007-06-19)
  [1212] What brought YOU to our web site? - (2007-06-01)
  [1207] Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27)
  [1198] From Web to Web 2 - (2007-05-21)
  [1186] Two new pages / sites - (2007-05-14)
  [1184] Finding resources - some pointers - (2007-05-13)
  [1177] Sorting out for a site map - (2007-05-05)
  [1104] Drawing dynamic graphs in PHP - (2007-03-09)
  [1055] Above the fold - (2007-01-28)
  [1029] Our search engine placement is dropping. - (2007-01-11)
  [1015] Search engine placement - long term strategy and success - (2006-12-30)
  [994] Training on Cascading Style Sheets - (2006-12-17)
  [976] Santa at the station - (2006-12-09)
  [916] Driving customers away - (2006-11-07)
  [893] Visibility - (2006-10-14)
  [800] Effective web campaign? - (2006-07-12)
  [767] Finding the language preference of a web site visitor - (2006-06-18)
  [757] Horse and Python training - (2006-06-12)
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [718] Protecting images from theft - (2006-05-12)
  [681] Mirroring a dynamic site - (2006-04-12)
  [658] Keeping the visitors happy and browsing - (2006-03-26)
  [649] Denial of Service ''attack'' - (2006-03-17)
  [533] Bigger Box Campaign - (2005-12-18)
  [528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
  [510] Dynamic Web presence - next generation web site - (2005-11-29)
  [492] New Navigation Aid - Launch of My Wellho - (2005-11-11)
  [414] Form Madness - (2005-08-14)
  [376] What brings people to my web site? - (2005-07-13)
  [369] CMS - the minefield of Choices - (2005-07-05)
  [348] Graveyard pages - (2005-06-15)
  [347] Frightening and from-friend viruses and spams - (2005-06-14)
  [322] More maps - (2005-05-23)
  [320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
  [314] What language is this written in? - (2005-05-17)
  [311] Growth pains - (2005-05-14)
  [288] Colour blindness for web developers - (2005-04-22)
  [284] The Iconish language - (2005-04-19)
  [278] Cover all the options - (2005-04-13)
  [276] An apology to Mr Boneparte - (2005-04-11)
  [274] Our most popular resources - (2005-04-10)
  [268] Information request forms, cleaning up spam - (2005-04-05)
  [261] Putting a form online - (2005-03-29)
  [259] Responding to spam - (2005-03-27)
  [222] Who are all these visitors? - (2005-02-20)
  [202] Searching for numbers - (2005-02-04)
  [197] Allow for peak traffic on your web site - (2005-02-01)
  [182] Your personal Google ranking - (2005-01-19)
  [179] The hunt for unique words - (2005-01-16)
  [173] Data Mining - (2005-01-09)
  [165] Implementing an effective site search engine - (2005-01-01)
  [142] Colour for access - (2004-12-06)
  [117] A case of case - (2004-11-14)
  [109] URLs - a service and not a hurdle - (2004-11-04)
  [98] No more 'Error 404' pages. Something better. - (2004-10-24)
  [32] Web design platoon - (2004-08-29)
  [23] Skills and responsibilities - (2004-08-22)

G911 - Well House Consultants - Search Engine Optimisation
  [4121] Has your Twitter feed stopped working? Switching to their new API - (2013-06-23)
  [3746] Google Analytics and the new UK Cookie law - (2012-06-02)
  [3670] Reading Google Analytics results, based on the relative populations of countries - (2012-03-24)
  [2748] Monitoring the success and traffic of your web site - (2010-05-01)
  [2686] Freedom of Information - consideration for web site designers - (2010-03-20)
  [2562] Tuning the web site for sailing on through this year - (2010-01-03)
  [2428] Diluting History - (2009-09-27)
  [2330] Update - Automatic feeds to Twitter - (2009-08-09)
  [2324] What search terms FAIL to bring visitors to our site, when they should? - (2009-08-05)
  [2137] Reaching the right people with your web site - (2009-04-23)
  [2107] How to tweet automatically from a blog - (2009-03-28)
  [2106] Learning to Twitter / what is Twitter? - (2009-03-28)
  [2045] Does robots.txt actually work? - (2009-02-16)
  [2019] Baby Caleb and Fortune City in your web logs? - (2009-01-31)
  [2000] 2000th article - Remember the background and basics - (2009-01-18)
  [1984] Site24x7 prowls uninvited - (2009-01-10)
  [1971] Telling Google which country your business trades in - (2009-01-02)
  [1969] Search Engines. Getting the right pages seen. - (2009-01-01)
  [1344] Catching up on indexing our resources - (2007-09-10)
  [427] The Melksham train - a button is pushed - (2005-08-28)


Back to
East of Melksham Countryside
Previous and next
or
Horse's mouth home
Forward to
Melksham Industrial
Some other Articles
Efficient calls to subs in Perl - avoid duplication, gain speed
Playing Catchup
Perl - lists do so much more than arrays
Melksham Industrial
Static mirroring through HTTrack, wget and others
East of Melksham Countryside
Internal Dummy Connections on Apache httpd
Virtual hosting and mod_proxy forwarding of different domains (httpd)
Tomcat 6 - Annotated Sample Configuration Files
Database connection Pooling, SSL, and command line deployment - httpd and Tomcat
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2019: 404 The Spa • Melksham, Wiltshire • United Kingdom • SN12 6QL
PH: 01225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/2065_Sta ... thers.html • PAGE BUILT: Sat May 27 16:49:10 2017 • BUILD SYSTEM: WomanWithCat