Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
Using Perl to read an RSS feed off a web site and extract data - via LWP and XML modules

Perl is excellent glue-ware ... something I was reminded of towards the end of Friday's Perl Programming Course. A delegate asked me how easy it is to grab an XML resource from the web (such as an RSS feed) and extract data from it. Well - you could write the code yourself, or you could use standard Perl modules which - these days - are shipped with the distibution anyway.

I used the library for web processes to read in the XML feed data:

  use LWP::UserAgent;
  
  $agent = LWP::UserAgent->new; # Create me a browser
  $agent->agent("Well House Consultants"); # Set the browser name
  $req = HTTP::Request->new(GET => ($urlsource)); # Set up the request we'll run
  $res = $agent->request($req); # Run the request
  $page = $res -> content();


Which I then saved to a local file for caching purposes (I had decided not to check the feed more than once every quarter of an hour).

I then intepretter the data stream using the XML::Parser module:

  use XML::Parser;
  
  $parser = new XML::Parser(Handlers => {Start => \&entering, Char => \&handle_char});
  $parser -> parse($page);


And I provided subs called entering and handle_char to handle the tags of interest. Here's some sample output, showing the extracted data:

  text: Re: A good problem to have?
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11326.msg118375#msg118375
  text: Re: Unique opportunity to travel the Portbury Branch Line - Saturday 29 Sept 2012
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11167.msg118374#msg118374
  text: Re: A good problem to have?
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11326.msg118373#msg118373
  text: Re: FGW (bad) experience 27/9/2012 - Maidenhead to Paddington


Full code is [here]. Other samples of LWP are [here] and [here]. Further XML::Parser examples are [here] (SAX) and [here] (DOM), and if you look at any of those examples you find links to further code and articles too.

We'll talk about XML and Web Processes briefly on any of our Per Courses is they're relevant to any of the delegates attending. If you need to get deeper into these modules, into Object Oriented Perl, and into handling large data flows, take a look at our Perl for Larger Projects course.
(written 2012-09-30, updated 2012-10-06)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
P402 - Perl - Writing Your Own Simple Client and Server
  [2047] Small Web Server in Perl - (2009-02-18)

P668 - Handling XML in Perl
  [2555] Bookkeeping - (2009-12-29)
  [2378] Handling XML in Perl - introduction and early examples - (2009-08-27)


Back to
Henbury loop, Bristol - a freight railway line with passenger potential?
Previous and next
or
Horse's mouth home
Forward to
Using CGI and Perl to put a simple application online. Sometimes still the best way.
Some other Articles
From Structured to Object Oriented Programming.
Public Transport Services - from and to Melksham
October to December 2012 - Public Courses
Using CGI and Perl to put a simple application online. Sometimes still the best way.
Using Perl to read an RSS feed off a web site and extract data - via LWP and XML modules
Henbury loop, Bristol - a freight railway line with passenger potential?
Trains across Wiltshire - an update on the TransWilts
On getting noticed for the right reasons when you ask about job availability
Writing more maintainable Perl - naming fields from your data records
How have Melksham shops changed in 60 years?
4318 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3874_Usi ... dules.html • PAGE BUILT: Thu Sep 18 15:30:25 2014 • BUILD SYSTEM: WomanWithCat