Home Accessibility Courses Diary The Mouth Facebook Resources Site Map About Us Contact
 
This week, we're updating our course layouts and descriptions. Presentation and materials always gently change over time, but just occasionally there's a need to make a step change to clear out some of the old and roll in the new. That's now happening - but over a long and complex site it's not instant and you'll see sections of the site changing up to and including 19th September.

See also [here] for status update
 
Scraping content for your own page via PHP

If your PHP allows for remote URLs to be handled / read as if they were files (and that's the default), you have useful tool which lets you include the content of one web page (or part of it) within another. For example, I can "scrape" the sections of a coming on a course page and insert them into another page.

Here's an example of the mechanism in use ...

1. Grab the page to be scraped:

$lyne = file_get_contents("http://www.wellho.co.uk/net/join.html");

2. Extract the data you want from it:

$includedtext = "";
preg_match_all("!<dt>(.+?)</dt>.*?<dd>(.+?)</dd>!s",$lyne,$here);
  for ($k=0; $k<count($here[0]); $k++) {
    $includedtext .= "<b>".htmlspecialchars(
      strip_tags($here[1][$k])).
      "</b><br />".     htmlspecialchars(
      strip_tags($here[2][$k])).
      "<br /><br />";
  }


3. Use the $includedtext within your code

You can try this out [here] and see the source code [here]

This example comes with a string of cautions ...

1. Do NOT allow just any old URL to be scraped, especially one that our users may enter. This leaves you open to having your content filled with their adverts!

2. If you are scraping the same page regularly and it doesn't change very much, you should cache the results and not make the inquiry every time.

3. Respect the robots exclusion standard (robots.txt) of the remote site that you're scraping,, and ensure that you have copyright permission to reproduce the material on your site too

4. Remember that if the remote site's format changes so that your regular expression no longer matches, you'll have a correction to make on your site PDQ!

We currently have examples of the use of scraped material on the Melksham Chamber of Commerce home page and also the First Great Western Coffee Shop. "Take the power of this facility ... but be careful how you use it!





(written 2009-12-21, updated 2010-01-06)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
H307 - PHP - Web2 and caching
  [4136] How do I post automatically from a PHP script to my Twitter account? - (2013-07-10)
  [4106] Web server efficiency - saving repetition through caches - (2013-05-30)
  [4075] Further recent PHP examples - (2013-04-28)
  [4055] Using web services to access you data - JSON and RESTful services - (2013-03-29)
  [3999] Handling failures / absences of your backend server nicely - (2013-02-08)
  [3955] Building up from a small PHP setup to an enterprise one - (2012-12-16)
  [3458] On this day ... one PHP script with three uses - (2011-09-26)
  [3186] How to add a customised twitter feed to your site - (2011-02-27)
  [3094] Setting your user_agent in PHP - telling back servers who you are - (2010-12-18)
  [3029] PHP data sources - other web servers, large data flows, and the client (browser) - (2010-11-04)
  [2321] Uploading and Downloading files - changing names (Perl and PHP) - (2009-08-04)
  [2196] New Example - cacheing results in PHP for faster loading - (2009-05-24)
  [1995] Automated server heartbeat and health check - (2009-01-16)
  [1926] Flash (client) to PHP (server) - example - (2008-12-06)
  [1814] Javascript/HTML example, dynamic server monitor - (2008-09-28)
  [1813] Ajax - going Asyncronous and what it means - (2008-09-28)
  [1812] Starting Ajax - easy example of browser calling up server data - (2008-09-27)
  [1733] memcached - overview, installation, example of use in PHP - (2008-08-02)
  [1647] Exchange Rates - PHP with your prices in your users currency - (2008-05-19)
  [1633] Changing a screen saver from a web page (PHP, Perl, OSX) - (2008-05-06)


Back to
Vision for Wiltshire
Previous and next
or
Horse's mouth home
Forward to
How well do you know Melksham - a quiz for Christmas
Some other Articles
Christmas Day ...
Ten years ago, we moved to Melksham Spa
The great thing about snow ....
How well do you know Melksham - a quiz for Christmas
Scraping content for your own page via PHP
Vision for Wiltshire
Day and night at Christmas
My armpit was like a zebra crossing
VAT Changes
Rock and hard place .. and the relaxing right one won
4287 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/2545_Scr ... a-PHP.html • PAGE BUILT: Sun Mar 30 15:20:58 2014 • BUILD SYSTEM: WomanWithCat