Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Module to grab html pages

Posted by Gelembjuk (Gelembjuk), 14 November 2006
Hello.

I have developed perl module HTML::ListGrabber for extracting data from HTML pages.

I want to hear remarks about it from Perl developers.

Manual and Demo on my site http://gelembjuk.il.if.ua/

What do you think?

Posted by admin (Graham Ellis), 15 November 2006
Well - there's a lot of other modules already out there.  If you compare it to others (such as LWP - see http://www.perl.com/pub/a/2002/08/20/perlandlwp.html which some of us already use) then you'll help us learn under what circumstances it might be appropriate to use your module.

Posted by Gelembjuk (Gelembjuk), 15 November 2006
LWP module only downloads web resources by http protocol.

My module is used to extract data from html text.

Example (extracting products info from amazon.com search results):
Code:
use HTML::ListGrabber;

$grabber=HTML::ListGrabber->new;

$template='<td class="imageColumn" ><table><tr><td>
<a ><datatag name="img" -extractfrom="img" -attrforextract="src"></a>
<datatag name="null" -pass="all">
<td class="dataColumn"><table ><tr><td>
<datatag name="link" -extractfrom="a" -attrforextract="href">    
<datatag name="title"  -pass="span"></a>
<datatag name="author"><span class="bindingBlock">
<datatag name="null" -pass="all">
<norequired>  
<span class="listprice"><datatag name="listprice"></span>
</norequired>
<datatag name="null" -pass="all">
<span class="otherprice"><datatag name="otherprice"></span>';

$grabber->setTemplate($template);

$url="http://amazon.com/s/field-keywords=perl";

@a=$grabber->grabListedData($url);

foreach $k(@a){
       print "--------------------------------\n";
       foreach $p (keys %$k){
           print "$p => $$k{$p}\n";
       }
   }


Output (not all):

Code:
--------------------------------
link => http://www.amazon.com/Learning-Perl-Second-Randal-Schwartz/dp/B00005R09A/sr=8-1/qid=1163574960/ref=pd_bbs_1/103-2135971-0091843?ie=UTF8&s=books
listprice => $39.95
img => http://ec1.images-amazon.com/images/P/B00005R09A.01._SCTHUMBZZZ_.jpg
title => Learning Perl, Second Edition
author => by Randal L. Schwartz, Tom Christiansen, and Larry Wall
otherprice => $19.50
--------------------------------
link => http://www.amazon.com/Programming-Perl-2nd-Larry-Wall/dp/B00005R09P/sr=8-3/qid=1163574960/ref=pd_bbs_3/103-2135971-0091843?ie=UTF8&s=books
listprice => $49.95
img => http://ec1.images-amazon.com/images/P/B00005R09P.01._SCTHUMBZZZ_.jpg
title => Programming Perl (2nd Edition)
author => by Larry Wall, Tom Christiansen, Randal L. Schwartz, and Stephen Potter
otherprice => $23.20
--------------------------------
link => http://www.amazon.com/Perl-Cookbook-Second-Tom-Christiansen/dp/0596003137/sr=8-5/qid=1163574960/ref=pd_bbs_sr_5/103-2135971-0091843?ie=UTF8&s=books
listprice => $49.95
img => http://ec1.images-amazon.com/images/P/0596003137.01._SCTHUMBZZZ_.jpg
title => Perl Cookbook, Second Edition
author => by Tom Christiansen and Nathan Torkington
otherprice => $18.02


Posted by admin (Graham Ellis), 15 November 2006
Many thanks for clearing that up .... there's so many modules out there that it's sometimes difficult to see where each of them fits in.



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho