| |||||||||||
| |||||||||||
PHP and Regular Expressions Posted by Joe_Guynan (Joe_Guynan), 31 January 2007 Hi everyone, I am new here, I regularly attended a PHP at Well House Manor, and very good it was too, and now I am seeking advice, just like Graham had predicted.well now, here is my predicament.... I am the function file(to read in a web page, then line by line, using eregi(), I am extracting the text src="...." so I can capture all the images and other included js file etc on the page. here is my PHP code: $urlDoc = file($_REQUEST[urlString],r) or die ("Error Accessing URL $_REQUEST[urlString]"); #capture the webpage into a variable #loop through each line and extract any src= if ($urlDoc[0]) { print ("<hr />"); #output a horizontal rule for ($i = 0 ; $i < count($urlDoc) ; $i++) { print (htmlspecialchars($urlDoc[$i])."<br />"); #outputs the HTML document line by line eregi('src="([a-z0-9/.]+)"',$urlDoc[$i],$imgSrc); #extracts the value of src= if ($imgSrc[1] != '') { #if an image exists on this line print ("$entry<br />"); #output } unset($imgSrc); #remove the array } } Now, the problem I have is this, there is a line on the HTMl page that reads.... </script><script language="JavaScript" src="/js/armph_products.js" type="text/javascript"></script><script language="JavaScript" src="/js/armph.js" type="text/javascript"></script><form action="/markets/home_solutions/armpoweredhouse.html" method="get" target="_blank"> And you can see that it contains two src="..." items, the first is src="/js/armph_products.js" and the second is src="/js/armph.js". When I am outputing, line by line, I am only getting src="/js/armph.js" from that particular line. I am guessing that eregi() just happily goes along each line and the last matching expression is the item that is stored into my variable $imgSrc. Can anyone shed some light, or let me know what I am doing wrong? Many thanks Joe Guynan Posted by admin (Graham Ellis), 31 January 2007 If you have multiple matches on a line, then I would go for preg_match_all (yes, I know, the OTHER regex handler ![]() Posted by Joe_Guynan (Joe_Guynan), 1 February 2007 Thanks Graham, I took your advice and it works a treat.Thanks again. Joe This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |