Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Digging out embedded tags with preg_match_all()

Posted by jamesbellamy (jamesbellamy), 28 October 2005
I am trying to use preg_match_all to extract the text between various custom tags starting with {{IF_:

Code:
$test = "{{IF_ONETHING}}Some text in here and {{IF_ANOTHER}}some additional{{ENDIF_ANOTHER}} tags{{ENDIF_ONETHING}} and here we have {{IF_YETANOTHER}}another if tag{{ENDIF_YETANOTHER}}";

$iftextsearch="/[{]{2}(IF_([A-Z]*))[}]{2}([\D\S]*)[{]{2}END\\1[}]{2}/";

preg_match_all($iftextsearch, $test, $loopedHTML, PREG_PATTERN_ORDER);


...which works fine when the tags come one after the other - it matches the contents of the IF_ONETHING tags and IF_YETANOTHER TAGS, but fails to match the IF_ANOTHER tags which are contained within the IF_ONETHING tags. I know that this is because preg_match_all carries on its search from the end of the first match. Is there any way to prevent this - have the next search start from the second character of the first match for instance?

Posted by admin (Graham Ellis), 28 October 2005
How about using preg_match and capturing the offset, in a loop?

Code:
<?php
$text = "<h1>This is a <b>bold</b>header</h1>and<i>what fun</i>";
while (preg_match('!<([^>]+)>(.*?)</\1>!',
       substr($text,$from), $gotten, PREG_OFFSET_CAPTURE)) {
       print $gotten[0][0];
       print "\n";
       $from += $gotten[0][1]+1;
       }
?>


Which gave me ...

Code:
demo$ php pmoffs.php
<h1>This is a <b>bold</b>header</h1>
<b>bold</b>
<i>what fun</i>
demo$


Example simplified to use easier tags, but I think it's the sort of thing?

Posted by jamesbellamy (jamesbellamy), 31 October 2005
Thanks Graham. That would indeed be the perfect solution, except that I am lumbered with PHP 4.2.2 which is one version away from supporting offset capturing. The idea of offsets is now in my head however, so I'm going to try again with preg_match_all. I will post the results here.

Posted by jamesbellamy (jamesbellamy), 3 November 2005
This is how I solved the problem (credit to my colleague Emily for the idea):

In words, the solution is:

1) Do a preliminary search on the string
2) Add any results to 2 arrays, one with the tags still present, one without ($matchesTags and $matchesNoTags) - the results with tags included are the ones I am trying to find, but could be omitted otherwise
3) Count how many results were produced by the first search by counting the number of elements in the array
4) Now, in a loop, search through the array WITHOUT tags to see whether further results exist
5) If results DO exist, append them to the end of the two arrays created earlier (with array_push).
6) Count the number of array elements again.
7) Keep repeating the loop through the $matchesNoTags array until the index number of the loop equals the count of the array (ie. No further results have been found)

Code:
//Sample string containing several, multiple embedded IF tags
$test = "{{IF_ONE}}Test One{{IF_TWO}}Test Two{{IF_FIVE}}Test 5{{ENDIF_FIVE}}{{ENDIF_TWO}}Test One continued{{ENDIF_ONE}} and {{IF_THREE}}Test 3 and {{IF_FOUR}}Embedded in three - test 4{{ENDIF_FOUR}}{{ENDIF_THREE}}";

//The regex to extract any pair of IF tags in the form {{IF_ANYTHING}}Some text{{ENDIF_ANYTHING}}
$iftextsearch="/[{]{2}(IF_([A-Z]*))[}]{2}([\D\S]*)[{]{2}END\\1[}]{2}/";

//Perform the search
$c=preg_match_all($iftextsearch, $test, $loopedHTML, PREG_PATTERN_ORDER);
           //Write results to an array if there are any
           for ($i=0;$i<$c;$i++)
                 {
                 if ($loopedHTML[0][$i]!="") {$matchesTags[$i]=$loopedHTML[0][$i];}
//And the result without the tags either end to search next time
                    if ($loopedHTML[3][$i]!="") {$matchesNoTags[$i]=$loopedHTML[3][$i];}
                 }
//Find out how many results there were
     $arrayCount = count($matchesTags);

           for ($i=0;$i<$arrayCount;$i++)
                 {
                 $c=preg_match_all($iftextsearch, $matchesNoTags[$i], $loopedHTML, PREG_PATTERN_ORDER);
                       for ($ii=0;$ii<$c;$ii++)
                             {
                             if ($loopedHTML[0][$ii]!="") {array_push($matchesTags, $loopedHTML[0][$ii]);}
                                  if ($loopedHTML[3][$ii]!="") {array_push($matchesNoTags, $loopedHTML[3][$ii]);}
                             }
                 $arrayCount = count($matchesTags);
                 }

//Print some results
     echo $test
     print_r($matchesTags);



This produces the result:
Code:
{{IF_ONE}}Test One{{IF_TWO}}Test Two{{IF_FIVE}}Test 5{{ENDIF_FIVE}}{{ENDIF_TWO}}Test One continued{{ENDIF_ONE}} and {{IF_THREE}}Test 3 and {{IF_FOUR}}Embedded in three - test 4{{ENDIF_FOUR}}{{ENDIF_THREE}}
Array
(
   [0] => {{IF_ONE}}Test One{{IF_TWO}}Test Two{{IF_FIVE}}Test 5{{ENDIF_FIVE}}{{ENDIF_TWO}}Test One continued{{ENDIF_ONE}}
   [1] => {{IF_THREE}}Test 3 and {{IF_FOUR}}Embedded in three - test 4{{ENDIF_FOUR}}{{ENDIF_THREE}}
   [2] => {{IF_TWO}}Test Two{{IF_FIVE}}Test 5{{ENDIF_FIVE}}{{ENDIF_TWO}}
   [3] => {{IF_FOUR}}Embedded in three - test 4{{ENDIF_FOUR}}
   [4] => {{IF_FIVE}}Test 5{{ENDIF_FIVE}}
)


All search results are neatly contained into one array with the added potential benefit that the array order is hierarchical - the most embedded tags will appear at the end of the array.

I hope this helps somebody



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho