If your PHP allows for remote URLs to be handled / read as if they were files (and that's the default), you have useful tool which lets you include the content of one web page (or part of it) within another. For example, I can "scrape" the sections of a
coming on a course page and insert them into another page.
Here's an example of the mechanism in use ...
1. Grab the page to be scraped:
$lyne = file_get_contents("http://www.wellho.co.uk/net/join.html");
2. Extract the data you want from it:
$includedtext = "";
preg_match_all("!<dt>(.+?)</dt>.*?<dd>(.+?)</dd>!s",$lyne,$here);
for ($k=0; $k<count($here[0]); $k++) {
$includedtext .= "<b>".htmlspecialchars(
strip_tags($here[1][$k])).
"</b><br />". htmlspecialchars(
strip_tags($here[2][$k])).
"<br /><br />";
}
3. Use the
$includedtext within your code
You can try this out
[here] and see the source code
[here]
This example comes with a string of cautions ...
1. Do NOT allow just any old URL to be scraped, especially one that our users may enter. This leaves you open to having your content filled with their adverts!
2. If you are scraping the same page regularly and it doesn't change very much, you should cache the results and not make the inquiry every time.
3. Respect the robots exclusion standard (robots.txt) of the remote site that you're scraping,, and ensure that you have copyright permission to reproduce the material on your site too
4. Remember that if the remote site's format changes so that your regular expression no longer matches, you'll have a correction to make on your site PDQ!
We currently have examples of the use of scraped material on the
Melksham Chamber of Commerce home page and also the
First Great Western Coffee Shop. "Take the power of this facility ... but be careful how you use it!
(written 2009-12-21, updated 2010-01-06)
23ed
Associated topics are indexed under
H307 - PHP - Web2 and caching [4075] Further recent PHP examples - (2013-04-28)
[4055] Using web services to access you data - JSON and RESTful services - (2013-03-29)
[3999] Handling failures / absences of your backend server nicely - (2013-02-08)
[3955] Building up from a small PHP setup to an enterprise one - (2012-12-16)
[3458] On this day ... one PHP script with three uses - (2011-09-26)
[3186] How to add a customised twitter feed to your site - (2011-02-27)
[3094] Setting your user_agent in PHP - telling back servers who you are - (2010-12-18)
[3029] PHP data sources - other web servers, large data flows, and the client (browser) - (2010-11-04)
[2321] Uploading and Downloading files - changing names (Perl and PHP) - (2009-08-04)
[2196] New Example - cacheing results in PHP for faster loading - (2009-05-24)
[1995] Automated server heartbeat and health check - (2009-01-16)
[1926] Flash (client) to PHP (server) - example - (2008-12-06)
[1814] Javascript/HTML example, dynamic server monitor - (2008-09-28)
[1813] Ajax - going Asyncronous and what it means - (2008-09-28)
[1812] Starting Ajax - easy example of browser calling up server data - (2008-09-27)
[1733] memcached - overview, installation, example of use in PHP - (2008-08-02)
[1647] Exchange Rates - PHP with your prices in your users currency - (2008-05-19)
[1633] Changing a screen saver from a web page (PHP, Perl, OSX) - (2008-05-06)
Some other Articles
Christmas Day ...Ten years ago, we moved to Melksham SpaThe great thing about snow ....How well do you know Melksham - a quiz for ChristmasScraping content for your own page via PHPVision for WiltshireDay and night at ChristmasMy armpit was like a zebra crossingVAT ChangesRock and hard place .. and the relaxing right one won