Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
Scraping content for your own page via PHP

If your PHP allows for remote URLs to be handled / read as if they were files (and that's the default), you have useful tool which lets you include the content of one web page (or part of it) within another. For example, I can "scrape" the sections of a coming on a course page and insert them into another page.

Here's an example of the mechanism in use ...

1. Grab the page to be scraped:

$lyne = file_get_contents("http://www.wellho.co.uk/net/join.html");

2. Extract the data you want from it:

$includedtext = "";
preg_match_all("!<dt>(.+?)</dt>.*?<dd>(.+?)</dd>!s",$lyne,$here);
  for ($k=0; $k<count($here[0]); $k++) {
    $includedtext .= "<b>".htmlspecialchars(
      strip_tags($here[1][$k])).
      "</b><br />".     htmlspecialchars(
      strip_tags($here[2][$k])).
      "<br /><br />";
  }


3. Use the $includedtext within your code

You can try this out [here] and see the source code [here]

This example comes with a string of cautions ...

1. Do NOT allow just any old URL to be scraped, especially one that our users may enter. This leaves you open to having your content filled with their adverts!

2. If you are scraping the same page regularly and it doesn't change very much, you should cache the results and not make the inquiry every time.

3. Respect the robots exclusion standard (robots.txt) of the remote site that you're scraping,, and ensure that you have copyright permission to reproduce the material on your site too

4. Remember that if the remote site's format changes so that your regular expression no longer matches, you'll have a correction to make on your site PDQ!

We currently have examples of the use of scraped material on the Melksham Chamber of Commerce home page and also the First Great Western Coffee Shop. "Take the power of this facility ... but be careful how you use it!





(written 2009-12-21, updated 2010-01-06)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
H307 - PHP - Web2 and caching
  [1633] Changing a screen saver from a web page (PHP, Perl, OSX) - (2008-05-06)
  [1647] Exchange Rates - PHP with your prices in your users currency - (2008-05-19)
  [1733] memcached - overview, installation, example of use in PHP - (2008-08-02)
  [1812] Starting Ajax - easy example of browser calling up server data - (2008-09-27)
  [1813] Ajax - going Asyncronous and what it means - (2008-09-28)
  [1814] Javascript/HTML example, dynamic server monitor - (2008-09-28)
  [1926] Flash (client) to PHP (server) - example - (2008-12-06)
  [1995] Automated server heartbeat and health check - (2009-01-16)
  [2196] New Example - cacheing results in PHP for faster loading - (2009-05-24)
  [2321] Uploading and Downloading files - changing names (Perl and PHP) - (2009-08-04)
  [3029] PHP data sources - other web servers, large data flows, and the client (browser) - (2010-11-04)
  [3094] Setting your user_agent in PHP - telling back servers who you are - (2010-12-18)
  [3186] How to add a customised twitter feed to your site - (2011-02-27)
  [3458] On this day ... one PHP script with three uses - (2011-09-26)
  [3955] Building up from a small PHP setup to an enterprise one - (2012-12-16)
  [3999] Handling failures / absences of your backend server nicely - (2013-02-08)
  [4055] Using web services to access you data - JSON and RESTful services - (2013-03-29)
  [4075] Further recent PHP examples - (2013-04-28)
  [4106] Web server efficiency - saving repetition through caches - (2013-05-30)
  [4136] How do I post automatically from a PHP script to my Twitter account? - (2013-07-10)
  [4627] Caching results in an object for efficiency - avoiding re-calculation - (2016-01-20)


Back to
Vision for Wiltshire
Previous and next
or
Horse's mouth home
Forward to
How well do you know Melksham - a quiz for Christmas
Some other Articles
Christmas Day ...
Ten years ago, we moved to Melksham Spa
The great thing about snow ....
How well do you know Melksham - a quiz for Christmas
Scraping content for your own page via PHP
Vision for Wiltshire
Day and night at Christmas
My armpit was like a zebra crossing
VAT Changes
Rock and hard place .. and the relaxing right one won
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2021: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/2545_Scr ... a-PHP.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb