| ||||||||||||||||||||||
| ||||||||||||||||||||||
Counting character groups in a string? Posted by pgroves (pgroves), 24 October 2002 Hi - I need to process a large (1.2 Mb) text file, on each line there is a string that looks like this:C01.252.200.500.550.800 C01.252.354.400 C01.252.410.890.328 etc. For reasons I'm not going to go into here, I need to count the number of three charcter groups (seperated by a ".") for each line, and was wondering which was the most efficient way to do this? I can do this using explode, e.g: Code:
Where $data is a line and $bits is the array of matches. I can then count the number of 3 character groups by doing: Code:
Or is it more efficient to use a regular expression to match each three charcter group and then get the size of the returned array? In any case how would this be coded? I can't seem to get it right, here's my attempt: Code:
I'm obviously not doing this right, as Code:
Also - how do you count the number of matches an in a regular expression? Is $count($bits) the right way? cheers Paul Posted by admin (Graham Ellis), 24 October 2002 First thought .... if the characters are explicitly 3 character groupsbetween each "." as you seem to imply, why not simply write: Code:
There's also a function called substr_count that counts the number of occurrences of one string in another, so Code:
Now I confess I've never used that one myself, but it strikes me it's pretty likely to be efficient. Posted by pgroves (pgroves), 24 October 2002 on 10/24/02 at 14:02:39, Graham Ellis wrote:
Out of interest I ran the different methods on our server and timed how long each one took (averaged over 4 goes), the results were: Explode: 5.7 secs Divide by 4: 5.6 secs Substr: 5.2 secs So there's not much in it really, though it possibly looks like Substr might be the quickest BTW how *would* you count the number of 3 character matches using ereg? cheers Paul Posted by admin (Graham Ellis), 24 October 2002 Within a regular expression, you use round brackets around groups you want to capture, otherwise you just get one string returned and that's the entire match - that's why you got a count of just 1.Amazingly, although I'm a fan of regular expressions I'm going to discourage you from using them in this case; one of their weaknesses is that if you have a bracket with a count after it, only the LAST match to that bracket will be saved into the target match variable which would be a problem we would have to work around in your example. You would also be in some trouble if you have more that 9 groups, and ereg silently discards the 10th and subsequent matches .... Summary, Rgeular expressions are great, but not for what you want to do ![]() P.S. Timing differences may be more significant than you think; how long does it take to run your program and no nothing at all? I wonder how much of your 5.something seconds are consumed by reading the file rather than by the matching Posted by pgroves (pgroves), 24 October 2002 on 10/24/02 at 14:39:22, Graham Ellis wrote:
I tried running the program on just 200 lines of text, but it happens too quickly to notice *any* significant differences! Paul This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| |||||||||||||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |