| |||||||||||
| |||||||||||
how to tokenize in perl Posted by baby_perl (baby_perl), 26 February 2005 hellopls how can i achieve the unix function tr -cs 'A-Za-z' '\n' < p.txt in perl. such that if i have a file called p.txt containing the following <DOC> <PER> peter </PER> asked, has the bus arrived?. but she replied with anger 'NO!!!!!!'. </DOC> i want the program to make a newline anytime it sees any word ^ with< and $ with > and anytime it sees white space or non consecutive alphanumeric characters such as .,?! finally It should be able to squeeze spaces caused by the '\012' . thus the output should look like this-: peter asked has the bus arrived but she replied with anger no cheers baby perl Posted by admin (Graham Ellis), 26 February 2005 Your Unix tr will translate straight into Perl but it doesn't do what you're looking for (even in Unix):Code:
We're primarily here to help with problems that you're having in writing Perl programs and not to provide complete solutions (people get paid for that and I would hate to put them out of business ![]() Personally, I wouldn't look to doing a direct translation and using Perl's tr function; I would probably write something like: Code:
This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |