Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Counting character groups in a string?

Posted by pgroves (pgroves), 24 October 2002
Hi - I need to process a large (1.2 Mb) text file, on each line there is a string that looks like this:

C01.252.200.500.550.800
C01.252.354.400
C01.252.410.890.328
etc.

For reasons I'm not going to go into here, I need to count the number of three charcter groups (seperated by a ".") for each line, and was wondering which was the most efficient way to do this? I can do this using explode, e.g:

Code:
$bits = explode(".",$data[1]);


Where $data is a line and $bits is the array of matches.
I can then count the number of 3 character groups by doing:

Code:
$level = count($bits);


Or is it more efficient to use a regular expression to match each three charcter group and then get the size of the returned array? In any case how would this be  coded? I can't seem to get it right, here's my attempt:

Code:
ereg('[A-Z0-9]{3}$+',$data[1],$bits);


I'm obviously not doing this right, as Code:
count($bits)
is always 1, but I don't know how to do it correctly (still getting my head around regular expressions) - could someone help?

Also - how do you count the number of matches an in a regular expression? Is $count($bits) the right way?

cheers

Paul






Posted by admin (Graham Ellis), 24 October 2002
First thought .... if the characters are explicitly 3 character groups
between each "." as you seem to imply, why not simply write:
Code:
       $ngroups = (count($data[1]) + 1 ) / 4;


There's also a function called substr_count that counts the number of
occurrences of one string in another, so
Code:
       $nperiods = substr_count($data[1],".");

Now I confess I've never used that one myself, but it strikes me it's
pretty likely to be efficient.



Posted by pgroves (pgroves), 24 October 2002
on 10/24/02 at 14:02:39, Graham Ellis wrote:
First thought .... if the characters are explicitly 3 character groups
between each "." as you seem to imply, why not simply write:
Code:
       $ngroups = (count($data[1]) + 1 ) / 4;


There's also a function called substr_count that counts the number of
occurrences of one string in another, so
Code:
       $nperiods = substr_count($data[1],".");

Now I confess I've never used that one myself, but it strikes me it's
pretty likely to be efficient.



Out of interest I ran the different methods on our server and timed how long each one took (averaged over 4 goes), the results were:

Explode: 5.7 secs
Divide by 4: 5.6 secs
Substr: 5.2 secs

So there's not much in it really, though it possibly looks like Substr might be the quickest

BTW how *would* you count the number of 3 character matches using ereg?

cheers

Paul


Posted by admin (Graham Ellis), 24 October 2002
Within a regular expression, you use round brackets around groups you want to capture, otherwise you just get one string returned and that's the entire match - that's why you got a count of just 1.

Amazingly, although I'm a fan of regular expressions I'm going to discourage you from using them in this case;  one of their weaknesses is that if you have a bracket with a count after it, only the LAST match to that bracket will be saved into the target match variable which would be a problem we would have to work around in your example.  You would also be in some trouble if you have more that 9 groups, and ereg silently discards the 10th and subsequent matches ....

Summary, Rgeular expressions are great, but not for what you want to do

P.S. Timing differences may be more significant than you think;  how long does it take to run your program and no nothing at all?  I wonder how much of your 5.something seconds are consumed by reading the file rather than by the matching

Posted by pgroves (pgroves), 24 October 2002
on 10/24/02 at 14:39:22, Graham Ellis wrote:
Within a regular expression, you use round brackets around groups you want to capture, otherwise you just get one string returned and that's the entire match - that's why you got a count of just 1.

Amazingly, although I'm a fan of regular expressions I'm going to discourage you from using them in this case;  one of their weaknesses is that if you have a bracket with a count after it, only the LAST match to that bracket will be saved into the target match variable which would be a problem we would have to work around in your example.  You would also be in some trouble if you have more that 9 groups, and ereg silently discards the 10th and subsequent matches ....

Summary, Rgeular expressions are great, but not for what you want to do

P.S. Timing differences may be more significant than you think;  how long does it take to run your program and no nothing at all?  I wonder how much of your 5.something seconds are consumed by reading the file rather than by the matching


I tried running the program on just 200 lines of text, but it happens too quickly to notice *any* significant differences!

Paul




This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho