| |||||||||||
| |||||||||||
Extract first 20 words Posted by TedH (TedH), 11 November 2006 My script is writing to various files. Part of that is a text entry denoted say $input{'content'}I can grab that fine and write the whole lot as needed. But how do I just grab, say, the first 20 words of it and disregard the rest? I've spent about 3 days searching around and not found anything that even remotely approaches the concept (plenty of stuff about extracting lines and records, but not a predefined number of words). Hope you can help, thanks - Ted Posted by admin (Graham Ellis), 12 November 2006 I this the sort of thing?Code:
Posted by TedH (TedH), 12 November 2006 Thanks Graham. I had been looking at a split, but was way off base. The join - I didn't even know there was such a thing. I see how it works and that $starter now contains the extracted words from the new array @words. From there I can use $starter and refine my results, like HTML tag exclusions (the input is from a WYSIWYG editor) etc. many thanks - Ted Posted by admin (Graham Ellis), 12 November 2006 Been there before, Ted, on the "removing tags" thing.If you add in: Code:
then you'll remove all the tags and replace each of them with a space. Do that before the split, by the way, which compresses any multiple resulting white spaces into single spaces. If you need to go a bit more detailed / sophisticates, but may also want to get involved with deciding which tags result in a word break (things like <br>) and which can occur in the middle of a work (things like <u>); in that latter case, you would actually want to replace them with nothing rather than a space. You might also want to get involved with replacing sequences like < with < characters .... Posted by TedH (TedH), 12 November 2006 I had done some but they were after the split. I noticed inconsistancy happening. So gave the one (shorter than mine ![]() The input is from a WYSIWYG editor and has different responses depending on which browser is used. The one thing that IE does is put in the Code:
for spaces sometimes. I managed to clear those. It can insert entities on occassion. There are probably others I will need to attend to as I continue the development. The extraction is for an RSS feed, and XML does not like entities at all from what I see. So far I've been testing it and my feed is getting written with new entries and updated, if I edit it - a feature for those of us who type too fast and put in "teh" instead of the, then discover it in our feed later on. Posted by TedH (TedH), 12 November 2006 Well, I haven't broken it yet. ![]() Thanks for your input Graham. This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |