Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Using Perl to read an RSS feed off a web site and extract data - via LWP and XML modules

Perl is excellent glue-ware ... something I was reminded of towards the end of Friday's Perl Programming Course. A delegate asked me how easy it is to grab an XML resource from the web (such as an RSS feed) and extract data from it. Well - you could write the code yourself, or you could use standard Perl modules which - these days - are shipped with the distibution anyway.

I used the library for web processes to read in the XML feed data:

  use LWP::UserAgent;
  
  $agent = LWP::UserAgent->new; # Create me a browser
  $agent->agent("Well House Consultants"); # Set the browser name
  $req = HTTP::Request->new(GET => ($urlsource)); # Set up the request we'll run
  $res = $agent->request($req); # Run the request
  $page = $res -> content();


Which I then saved to a local file for caching purposes (I had decided not to check the feed more than once every quarter of an hour).

I then intepretter the data stream using the XML::Parser module:

  use XML::Parser;
  
  $parser = new XML::Parser(Handlers => {Start => \&entering, Char => \&handle_char});
  $parser -> parse($page);


And I provided subs called entering and handle_char to handle the tags of interest. Here's some sample output, showing the extracted data:

  text: Re: A good problem to have?
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11326.msg118375#msg118375
  text: Re: Unique opportunity to travel the Portbury Branch Line - Saturday 29 Sept 2012
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11167.msg118374#msg118374
  text: Re: A good problem to have?
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11326.msg118373#msg118373
  text: Re: FGW (bad) experience 27/9/2012 - Maidenhead to Paddington


Full code is [here]. Other samples of LWP are [here] and [here]. Further XML::Parser examples are [here] (SAX) and [here] (DOM), and if you look at any of those examples you find links to further code and articles too.

We'll talk about XML and Web Processes briefly on any of our Per Courses is they're relevant to any of the delegates attending. If you need to get deeper into these modules, into Object Oriented Perl, and into handling large data flows, take a look at our Perl for Larger Projects course.
(written 2012-09-30, updated 2012-10-06)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
P668 - Handling XML in Perl
  [2378] Handling XML in Perl - introduction and early examples - (2009-08-27)
  [2555] Bookkeeping - (2009-12-29)

P402 - Perl - Writing Your Own Simple Client and Server
  [2047] Small Web Server in Perl - (2009-02-18)


Back to
Henbury loop, Bristol - a freight railway line with passenger potential?
Previous and next
or
Horse's mouth home
Forward to
Using CGI and Perl to put a simple application online. Sometimes still the best way.
Some other Articles
From Structured to Object Oriented Programming.
Public Transport Services - from and to Melksham
October to December 2012 - Public Courses
Using CGI and Perl to put a simple application online. Sometimes still the best way.
Using Perl to read an RSS feed off a web site and extract data - via LWP and XML modules
Henbury loop, Bristol - a freight railway line with passenger potential?
Trains across Wiltshire - an update on the TransWilts
On getting noticed for the right reasons when you ask about job availability
Writing more maintainable Perl - naming fields from your data records
How have Melksham shops changed in 60 years?
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3874_.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb