Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
 
Using Perl to read Microsoft Word documents

Perl embraces new technologies and standards by providing a mechanism through which they can be supported - either built in to the Perl language itself, or through modules. This made Perl the natural choice for us recently when we wanted to automatically extract certain text from Microsoft Word documents for onwards inclusion in a database held on a Linux system.

With Perl, inevitably the hardest part of any such task is finding out how to do it - which modules to load, and from where, and how to call them. Here's the code we finally came up with:

use Win32::OLE;
use Win32::OLE::Enum;

$document = Win32::OLE -> GetObject($ARGV[1]);
open (FH,">$ARGV[0]");

print "Extracting Text ...\n";

$paragraphs = $document->Paragraphs();
$enumerate = new Win32::OLE::Enum($paragraphs);
while(defined($paragraph = $enumerate->Next()))
    {
    $style = $paragraph->{Style}->{NameLocal};
    print FH "+$style\n";
    $text = $paragraph->{Range}->{Text};
    $text =~ s/[\n\r]//g;
    $text =~ s/\x0b/\n/g;
    print FH "=$text\n";
    }

This example uses the Win32 modules to access the Word document (named as the second parameter of the command line) and saves each paragraph style name and contents into a plain text file (named as the first parameter on the command line).

The Win32 modules are available on the CPAN, and are also a standard part of the ActiveState distribution ... they make calls to Word itself, so this Perl application must be run on a Windows machine, and then the extracted data transferred


See also Training module - Perl with Windows

Please note that articles in this section of our web site were current and correct to the best of our ability when published, but by the nature of our business may go out of date quite quickly. The quoting of a price, contract term or any other information in this area of our website is NOT an offer to supply now on those terms - please check back via our main web site

Related Material

Perl - Use with Microsoft software

resource index - Perl
Solutions centre home page

You'll find shorter technical items at The Horse's Mouth and delegate's questions answered at the Opentalk forum.

At Well House Consultants, we provide training courses on subjects such as Ruby, Lua, Perl, Python, Linux, C, C++, Tcl/Tk, Tomcat, PHP and MySQL. We're asked (and answer) many questions, and answers to those which are of general interest are published in this area of our site.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2019: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01225 708225 • FAX: 01225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/solutions/perl-usi ... ments.html • PAGE BUILT: Wed Mar 28 07:47:11 2012 • BUILD SYSTEM: wizard