Perl embraces new technologies and standards by providing a mechanism through which they can be supported - either built in to the Perl language itself, or through modules. This made Perl the natural choice for us recently when we wanted to automatically extract certain text from Microsoft Word documents for onwards inclusion in a database held on a Linux system.
With Perl, inevitably the hardest part of any such task is finding out how to do it - which modules to load, and from where, and how to call them. Here's the code we finally came up with:
use Win32::OLE;
use Win32::OLE::Enum;
$document = Win32::OLE -> GetObject($ARGV[1]);
open (FH,">$ARGV[0]");
print "Extracting Text ...\n";
$paragraphs = $document->Paragraphs();
$enumerate = new Win32::OLE::Enum($paragraphs);
while(defined($paragraph = $enumerate->Next()))
{
$style = $paragraph->{Style}->{NameLocal};
print FH "+$style\n";
$text = $paragraph->{Range}->{Text};
$text =~ s/[\n\r]//g;
$text =~ s/\x0b/\n/g;
print FH "=$text\n";
}
This example uses the Win32 modules to access the Word document (named as the second parameter of the command line) and saves each paragraph style name and contents into a plain text file (named as the first parameter on the command line).
The Win32 modules are available on the CPAN, and are also a standard part of the ActiveState distribution ... they make calls to Word itself, so this Perl application must be run on a Windows machine, and then the extracted data transferred
See also
Training module - Perl with Windows
Please note that articles in this section of our
web site were current and correct to the best of our ability when published,
but by the nature of our business may go out of date quite quickly. The
quoting of a price, contract term or any other information in this area of
our website is NOT an offer to supply now on those terms - please check
back via
our main web site
Perl - Use with Microsoft softwareresource index - Perl
Solutions centre home page
You'll find shorter technical items at
The Horse's Mouth and
delegate's questions answered at
the
Opentalk forum.
At Well House Consultants, we provide
training courses on
subjects such as Ruby, Lua, Perl, Python, Linux, C, C++,
Tcl/Tk, Tomcat, PHP and MySQL. We're asked (and answer)
many questions, and answers to those which are of general
interest are published in this area of our site.