how to tokenize in perl
Posted by baby_perl (baby_perl), 26 February 2005hello
pls how can i achieve the unix function tr -cs 'A-Za-z' '\n' < p.txt in perl.
such that if i have a file called p.txt containing the following
<PER> peter </PER> asked, has the bus arrived?. but she replied with anger 'NO!!!!!!'.
i want the program to make a newline anytime it sees any word ^ with< and $ with > and anytime it sees white space or non consecutive alphanumeric characters such as .,?!
finally It should be able to squeeze spaces caused by the '\012' .
thus the output should look like this-:
Posted by admin (Graham Ellis), 26 February 2005Your Unix tr will translate straight into Perl but it doesn't do what you're looking for (even in Unix):
We're primarily here to help with problems that you're having in writing Perl programs and not to provide complete solutions (people get paid for that and I would hate to put them out of business . How far have you got? Can you post up the Perl code that you're working on so that we can offer help?
Personally, I wouldn't look to doing a direct translation and using Perl's tr function; I would probably write something like:
PH: 01225 708225 • FAX: 01225 793803 • EMAIL: email@example.com • WEB: http://www.wellho.net • SKYPE: wellho