Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
how to tokenize in perl

Posted by baby_perl (baby_perl), 26 February 2005
hello
pls how can i achieve the unix function tr -cs 'A-Za-z' '\n' < p.txt  in perl.
such that if i have a file called p.txt containing the following

<DOC>
<PER> peter </PER> asked,  has the bus arrived?. but she replied  with anger 'NO!!!!!!'.
</DOC>

i want the program to make a newline anytime it sees any word ^ with< and $ with >  and anytime it sees white space or non consecutive alphanumeric characters such as .,?!  
finally It should be able to squeeze spaces caused by the '\012' .

thus the output should look like this-:
peter
asked
has
the
bus
arrived
but
she
replied
with
anger
no

cheers
baby perl

Posted by admin (Graham Ellis), 26 February 2005
Your Unix tr will translate straight into Perl but it doesn't do what you're looking for (even in Unix):

Code:
earth-wind-and-fire:~/feb05 grahamellis$ perl -pe 'tr /A-Za-z/\n/cs' p.txt

DOC

PER
peter
PER
asked
has
the
bus
arrived
but
she
replied
with
anger
NO

DOC
earth-wind-and-fire:~/feb05 grahamellis$


We're primarily here to help with problems that you're having in writing Perl programs and not to provide complete solutions (people get paid for that and I would hate to put them out of business   .   How far have you got?   Can you post up the Perl code that you're working on so that we can offer help?

Personally, I wouldn't look to doing a direct translation and using Perl's tr function;  I would probably write something like:

Code:
open (FH,"p.txt");
read (FH,$in,-s "p.txt");
$in =~ s/<.*?>//sg;
@words = grep(/./,("\n",split(/[^[:alnum:]]+/s,$in)));
print (join("\n",@words),"\n");




This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho