Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
 
Handling XML in Perl - introduction and early examples

There are hundreds of modules (literally) in Perl for handing XML. Some of them are highly specialised, but others are of much more general use in reading (and in some cases writing) XML streams.

(definition of XML: Extended Markup Language - a tagging system in plain text for marking up data; not really a language, but rather a METAlanguage as it defines the format of the tags, and not which tags are and aren't valid! example of XML: An RSS feed such as [this one] from the blog)

There are really two major 'classes' of XML handlers - those which as "SAX" - a Simple API for XML - where certain data elements trigger a piece of code to be run as the data is traversed, and "DOM" - Domain Object Model - where the data gets stored into memory in a structure within the language - Perl in this case.

Here's a simple SAX style parser example in Perl:

use XML::Parser;
$parser = new XML::Parser(Style => "Subs", Handlers =>
    {Char => \&spew} );
$intitle = 0;
$parser -> parsefile('index.xml');
print "done\n";
 
sub title {
  $intitle = 1;
  $ti_data = "";
  }
 
sub title_ {
  $intitle = 0;
  print "We have \"$ti_data\"\n";
  }
 
sub spew {
  $intitle and $ti_data .= $_[1];
  }


The parser is set up to run the sub of the same name as each tag it enters, and to run a tag of the same name with a trailing underscore at each exit. Only subs for tags of interest need be provided. A separate handler is used for data which come between the tags, and in my "basics" exmple, I have simply used a global within my entry and exit subs to not when I am (and am not) within text or interest. I have pasted sample output onto the end of the full source code.

Additional options in the hash passed into the parser constructor allow you to place all your callback subs in a separate package (the pkg member), and of course you should comment you code well / consider using strict / have better sub names than "spew" ;-) ... my example is (as they mostly are) intended to show the mechanisms and techniques and not fully armoured, heavy, production code.
A SAX parser is a great way to handle large flows of data for extracting just a few specific, but it is not well suited to reforming / editing a piece of XML - for that you need a DOM like model. And the XML::Parser module lets you do that too. Parse as follows:

use XML::Parser;
$parser = new XML::Parser(Style => "Tree");
$struct = $parser -> parsefile('index.xml');


And that creates a list within $struct (which is a reference) of elements, each of which is itself a further list to represent the hierarchy of the tags in the XML, or is a scalar (for data) or hash (for the attributes of an open tag). Handling this structure can be done with quite a short piece of code, but it needs to be recursive / re-entrant, so great care needs to be taken to ensure that my variables are used where the levels are distinct, and our variables / globals / or parameters where data is passed between levels. I have placed an example of such code [here], showing how the same sample RSS data is loaded into memory and then used.

Other modules to handle XML include XML::LibXML (short example [here] and [here - saving changes to an XML document]) and ... in top of that ... XML::LibXSLT, which applies a transformation based on directives is the XSLT language to an XML data set.

With XSLT, the Perl code can be short:

use XML::LibXSLT;
use XML::LibXML;
 
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file("camels.xml");
my $style_doc = $parser->parse_file("camels.xsl");
 
my $xslt = XML::LibXSLT->new();
my $stylesheet = $xslt->parse_stylesheet($style_doc);
my $results = $stylesheet->transform($doc);
my $output = $stylesheet->output_string($results);
 
print $output;


See [here] for full source, and [here] for the XSLT specification of the transformation and [here] for the data.

You'll note that XSLT is a programming language itself .. which is an implementation of the XML Metalanguage. In other words, you'll find that even programming structures like loops are defined via tags - from <foreach> through to </foreach> for example. It feels odd and confusing when you first see it - but actually it's a brilliant concept as it means that your language syntax analyser is already written and in use on both that program and the data the program is dealing with.

We do not offer separate XML courses, but we do cover XML on language courses such as Perl for Larger Projects on which the new examples above were written yesterday.




For Python programmers ... the principles are similar. Here are some source examples:

Using SAX (via the standard SAX handler shipped with the Python distribution)

Using DOM (via minidom, also shipped!)

XSLT example in Python, using libxml2 and libxslt.
(written 2009-08-27)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
A301 - Web Application Deployment - XML, DTD, XSLT, XHTML and More
  [2554] Adding retrospective ALT attributes to IMG - (2009-12-28)
  [2246] What difference does using the XHTML standard really make? - (2009-06-18)
  [1901] XML, HTML, XHTML and more - (2008-11-23)
  [1050] The HTML++ Metalanguage - (2007-01-22)
  [653] Easy feed! - (2006-03-21)

P668 - Handling XML in Perl
  [3874] Using Perl to read an RSS feed off a web site and extract data - via LWP and XML modules - (2012-09-30)
  [2555] Bookkeeping - (2009-12-29)

Y151 - Python & XML
  [4710] Searching a Json or XML structure for a specific key / value pair in Python - (2016-10-30)
  [4594] XML handling in Python - a new teaching example using etree - (2015-12-09)
  [3082] XML handling in Python - SAX, DOM and XSLT examples - (2010-12-09)
  [2506] Good example of recursion in Python - analyse an RSS feed - (2009-11-18)


Back to
Wiltshire / Melksham Weddings - guest accommodation
Previous and next
or
Horse's mouth home
Forward to
Making variables persistant, pretending a database is a variable and other Perl tricks
Some other Articles
Giving up on user input - keyboard timeout in Perl
Checking the database connection manually
Object Oriented programming - a practical design example
Making variables persistant, pretending a database is a variable and other Perl tricks
Handling XML in Perl - introduction and early examples
Wiltshire / Melksham Weddings - guest accommodation
Long job - progress bar techniques (Perl)
Designing your data structures for a robust Perl application
Lead characters on Perl variable names
Translation from Ghanaian to English
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2019: 404 The Spa • Melksham, Wiltshire • United Kingdom • SN12 6QL
PH: 01225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/2378_Han ... mples.html • PAGE BUILT: Sat May 27 16:49:10 2017 • BUILD SYSTEM: WomanWithCat