| |||||||||||
| |||||||||||
convert a MS Word doc into multiple HTML pages Posted by lang2 (lang2), 4 March 2004 The MS Word file contains text and images (saved into the file, not OLE). For example, if the file has 3 headings 1, 1.1, 1.2, there should be 3 HTML files (one for each heading). Text and images between 1 heading to the next must be extracted. How do I know when I read up to the next heading? What are the options or easiest option available to do the task? Can I use Perl to parse and extract data from the Word file?Thanks. Posted by John_Moylan (jfp), 4 March 2004 Just a quick reply bacause this amused me.I thought it was an interesting question and decided to google for "microsoft word" and "perl" The first relevent page (3rd in the list) was: http://www.wellho.net/solutions/1480965085.html Well, they say that the best way to get high in the search rankings is to make sure your content it relevent. jfp Posted by admin (Graham Ellis), 4 March 2004 Word files are accessible through COM (Common Object Method) - you need to be running your Perl on a machine that has MS Word installed as it uses their .dll files. You'll find the necessary Perl module is supplies as part of the ActiveState release of Perl.References - two books http://www.wellho.net/book/0-7821-2862-9.html http://www.wellho.net/book/1-57870-067-1.html Here's a piece of code we use to extract text from a Word document - you should find it contains examples of thesort of thing you need Code:
Posted by admin (Graham Ellis), 4 March 2004 on 03/04/04 at 19:30:28, jfp wrote:
I thought I had seen it somewhere before ![]() This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |