With over ten thousand different web pages on our web site, the issue of finding the right resource has become just as big an issue as having the right material available in the first place. Listings by article type and number (
example) are great for crawlers / bots, and for staff checking page by page. Division of our material into modules (
module list) can help somewhat, but still leaves people having to go through lists with a determination that comes to past students who
know that they have a good chance of finding what they need, but misses the casual visitor (and potential trainee ;-) ) completely. That's where a search capability comes in - we've had one for a while, but not everyone says "I think I'll do a site search" so we want to automate that search, and add in a few results on many / most pages. You're probably very familiar with the sort of thing:

But how do I decide what to say in such a small area? How do I include the meat but trim out the fat? I've used regular expressions - and here (coded in PHP) are the specifics of what I have done for this example:
For the body, remove all the markup from the content block that we have stored in a database and trucate it to report just the first 150 characters, adding a few extra characters to avoid breaking it in the middle of a word, and then plonking a "..." on the end to illustrate that it's only the start of the body.
$body = strip_tags($row[body]);
$body = preg_replace('/^(.{150}\S*\s)(.*)$/s','\\1 ...',
$body);
For the URL, take any sections of 18 or more characters between successive dots and / or slashes, and replace them with 8 chars ... 5 chars. These days, URLs are semi-descriptive, often comprising the title of the article with dashes or underscores anyway, and these URLs give some browsers folding problems. But the used DOES want to see the end of the URL to know if it's a ".html" or a ".php" he'll be linking on to. Here's the URL code:
$uddd = "http://www.wellho.net$row[url]";
$uddd = preg_replace(
'!([/\.])([^/\.]{8})([^/\.]{5,})([^/\.]{5})([/\.])!',
'\\1\\2 ... \\4\\5',$uddd);
There are
not the world's simplest regular expressions (far from it!), yet they do show just how much can be done in a single statement. We cover such techniques with PHP specifically in mind on our
PHP Techniques Workshop, and in more depth (and regular expressions more generally) on our
Regular Expressions day.
You can see more results from these algorithms already in use on our resources pages (
example), and in time many (most? almost all?) pages on our site will have an improved and consistent 'see also' along these lines. Key features include:
Automated We don't have to go through and do all the work of adding in extra links on every page - just provide some categorisation hints.
Adaptive We're recording hit / visit counts, so that we can promote popular pages higher up the listings.
Consistent across slightly varied page types. The display should be morally identical no matter what the resource type is. Frankly, you don't care whether the answer to the question "how do I tell Google which country we operate in" is in a forum post, a longer article, or a blog entry written in May 2007 - you just want to find the f***ing answer!
There are some other "salesy" things I could add too.
Fast as it happens on the fly,
expandable as we have the basis for it to expand from 10,000 different URLs to 100,000 very easily ...
marketable - maybe; I'm certainly happy to tell you how we do things like this, and to sell you my time as I tell you and help you understand it in depth. That would be called a
private training course.
(written 2009-01-08, updated 2009-01-11)
Associated topics are indexed under
G911 - Well House Consultants - Search Engine Optimisation [2748] Monitoring the success and traffic of your web site - (2010-05-01)
[2686] Freedom of Information - consideration for web site designers - (2010-03-20)
[2562] Tuning the web site for sailing on through this year - (2010-01-03)
[2552] Web site traffic - real users, or just noise? - (2009-12-26)
[2428] Diluting History - (2009-09-27)
[2330] Update - Automatic feeds to Twitter - (2009-08-09)
[2324] What search terms FAIL to bring visitors to our site, when they should? - (2009-08-05)
[2137] Reaching the right people with your web site - (2009-04-23)
[2107] How to tweet automatically from a blog - (2009-03-28)
[2106] Learning to Twitter / what is Twitter? - (2009-03-28)
[2065] Static mirroring through HTTrack, wget and others - (2009-03-03)
[2045] Does robots.txt actually work? - (2009-02-16)
[2019] Baby Caleb and Fortune City in your web logs? - (2009-01-31)
[2000] 2000th article - Remember the background and basics - (2009-01-18)
[1984] Site24x7 prowls uninvited - (2009-01-10)
[1971] Telling Google which country your business trades in - (2009-01-02)
[1969] Search Engines. Getting the right pages seen. - (2009-01-01)
[1793] Which country does a search engine think you are located in? - (2008-09-11)
[1344] Catching up on indexing our resources - (2007-09-10)
[1029] Our search engine placement is dropping. - (2007-01-11)
[1015] Search engine placement - long term strategy and success - (2006-12-30)
[427] The Melksham train - a button is pushed - (2005-08-28)
[165] Implementing an effective site search engine - (2005-01-01)
G902 - Well House Consultants - Web site techniques, utility and visibility [3589] Promoting a single one of your domains on the search engines - (2012-01-22)
[3563] How big is a web page these days? Does the size of your pages matter? - (2011-12-26)
[3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
[3532] Sharing the user experience - designing a form with the customer in mind - (2011-11-29)
[3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
[3426] Automed web site testing scripted in Ruby using watir-webdriver - (2011-09-09)
[3367] Google +1 - what is it? - (2011-07-22)
[3197] Finding and diverting image requests from rogue domains - (2011-03-08)
[3149] Looking back at www.wellho.net - (2011-01-28)
[3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
[3022] Retaining web site visitors - reducing the one page wonders - (2010-10-31)
[2981] How to set up short and meaningfull alternative URLs - (2010-10-02)
[2668] Is it worth it? - (2010-03-09)
[2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
[2532] Analysing Google arrivals by country of origin - (2009-12-10)
[2519] Status Page / breaks of service in early December - (2009-11-30)
[2410] Removal of technical resources from this site - (2009-09-19)
[2389] Writing with our customers words - (2009-09-01)
[2341] Koulutus, Open Source tietokone kielillä - (2009-08-09)
[2340] ldning, Open Source dator språk - (2009-08-09)
[2339] Opplæring, Open Source datamaskinen språk - (2009-08-09)
[2338] Uddannelse, Open Source computer sprog - (2009-08-09)
[2337] Opleiding, Open Source computertalen - (2009-08-09)
[2336] Formação, Open Source computador línguas - (2009-08-09)
[2335] Ausbildung, die Open-Source-Sprachen - (2009-08-09)
[2334] Formazione, Open Source computer lingue - (2009-08-09)
[2333] Formación, de los lenguajes de código abierto - (2009-08-09)
[2332] Formation, des langages Open Source - (2009-08-09)
[2225] How important is a front page ranking on a search engine? - (2009-06-09)
[2056] Web Site Loading - experiences and some solutions shared - (2009-02-26)
[1970] Plagarism - who is copying my pages? - (2009-01-02)
[1961] Making our things easier to find - (2008-12-26)
[1955] How to avoid duplicating web page maintainance - (2008-12-20)
[1888] Find the link - (2008-11-16)
[1856] A few of my favourite things - (2008-10-26)
[1833] Web Bloopers - good form design - avoiding pitfalls - (2008-10-11)
[1797] I have been working hard but I do not expect you noticed - (2008-09-14)
[1756] Ever had One of THOSE mornings? - (2008-08-16)
[1747] Who is watching you? - (2008-08-10)
[1711] Rapid growth leads to server move - (2008-07-17)
[1653] How do Google Ads work? - (2008-05-25)
[1634] Kiss and Book - (2008-05-07)
[1630] To provide external links, or not? - (2008-05-04)
[1610] PHP course dot co, dot uk - (2008-04-13)
[1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
[1541] Colour, Composition or Content - (2008-02-16)
[1534] Where in the world / country is my visitor from? - (2008-02-07)
[1513] Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20)
[1506] Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14)
[1505] Script to present commonly used images - PHP - (2008-01-13)
[1494] A time to update pictures - (2008-01-03)
[1437] Above the fold with First Great Western - (2007-11-19)
[1297] Stuffing content into a web page - easy maintainance - (2007-08-09)
[1237] What proportion of our web traffic is robots? - (2007-06-19)
[1212] What brought YOU to our web site? - (2007-06-01)
[1207] Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27)
[1198] From Web to Web 2 - (2007-05-21)
[1186] Two new pages / sites - (2007-05-14)
[1184] Finding resources - some pointers - (2007-05-13)
[1177] Sorting out for a site map - (2007-05-05)
[1104] Drawing dynamic graphs in PHP - (2007-03-09)
[1055] Above the fold - (2007-01-28)
[994] Training on Cascading Style Sheets - (2006-12-17)
[976] Santa at the station - (2006-12-09)
[916] Driving customers away - (2006-11-07)
[893] Visibility - (2006-10-14)
[800] Effective web campaign? - (2006-07-12)
[767] Finding the language preference of a web site visitor - (2006-06-18)
[757] Horse and Python training - (2006-06-12)
[732] Where is a web site visitor browsing from - (2006-05-24)
[718] Protecting images from theft - (2006-05-12)
[681] Mirroring a dynamic site - (2006-04-12)
[658] Keeping the visitors happy and browsing - (2006-03-26)
[649] Denial of Service ''attack'' - (2006-03-17)
[533] Bigger Box Campaign - (2005-12-18)
[528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
[510] Dynamic Web presence - next generation web site - (2005-11-29)
[492] New Navigation Aid - Launch of My Wellho - (2005-11-11)
[414] Form Madness - (2005-08-14)
[376] What brings people to my web site? - (2005-07-13)
[369] CMS - the minefield of Choices - (2005-07-05)
[348] Graveyard pages - (2005-06-15)
[347] Frightening and from-friend viruses and spams - (2005-06-14)
[322] More maps - (2005-05-23)
[320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
[314] What language is this written in? - (2005-05-17)
[311] Growth pains - (2005-05-14)
[288] Colour blindness for web developers - (2005-04-22)
[284] The Iconish language - (2005-04-19)
[278] Cover all the options - (2005-04-13)
[276] An apology to Mr Boneparte - (2005-04-11)
[274] Our most popular resources - (2005-04-10)
[268] Information request forms, cleaning up spam - (2005-04-05)
[261] Putting a form online - (2005-03-29)
[259] Responding to spam - (2005-03-27)
[222] Who are all these visitors? - (2005-02-20)
[202] Searching for numbers - (2005-02-04)
[197] Allow for peak traffic on your web site - (2005-02-01)
[182] Your personal Google ranking - (2005-01-19)
[179] The hunt for unique words - (2005-01-16)
[173] Data Mining - (2005-01-09)
[142] Colour for access - (2004-12-06)
[117] A case of case - (2004-11-14)
[109] URLs - a service and not a hurdle - (2004-11-04)
[98] No more 'Error 404' pages. Something better. - (2004-10-24)
[32] Web design platoon - (2004-08-29)
[23] Skills and responsibilities - (2004-08-22)
Some other Articles
Walk to BowerhillLearning to program as a part of your jobKeeping PHP code in database and running itCooking bodies and URLsBitter coldMichelleLooking forward, in Melksham, in 2009From spam to mod_alias - finding resourcesGoing round the block