If I'm searching for a 50kg bag of cement, and the online store only offers 48kg bags, will their search engine find this product and say "is this what you want"?
Our own site searches do clever things with alphabetic searches but we're rarely had to do a "near number" hunt on our own behalf ... but we have for client sites.
Is 48 near to 50? Yes. Is 8 near to 10? Maybe, but not so near. Is 1 near to 3? No - almost certainly not. So you can't rely just on difference - indeed 93 is nearer to 100 that 1 is to 3 and the difference is much more.
Algorithm 1.
Let "$h" be the value you have and "$t" being the value you're testing. Then the nearness factor is defined as
abs( ($h + $t) / ($h - $t))
with the larger number being the closest. On this algorithm, an infinite result tells you that two values are numerically identical (so you had better extract that special case first), and higher numbers indicate better matches. Let's see some example factors:
48 and 50 - factor is 49
8 and 10 - factor is 9
1 and 3 - factor is 2
93 and 100 - factor is 27.57
Here's Perl code for searching (yes, we have
Perl search training) to work this our:
#!/usr/bin/perl
if ($ARGV[0] == $ARGV[1]) {
print "Parameters are numerically identical\n";
} else {
printf ("factor is %.2f for %s and %s\n",
abs(($ARGV[0]+$ARGV[1])/($ARGV[0]-$ARGV[1])),
$ARGV[0], $ARGV[1]);
}
Algorithm 2
The algorithm above isn't always ideal. If you're searching for phone numbers, for example, it's not helpful. If you've transposed digits values, you'll want to score hits on values that are numerically very different. For this, you'll want to use someting like a
Levenshtein distance algorithm. We talk further about this on our web site in the
Solutions Centre
(written 2005-02-04 06:06:02)
| Commentator | says ... | | Graham: | This is an offtopic comment. The software running "The Horse's Mouth" has just been upgraded to block bulk automated offtopic contributions. You may now find your comments are noted as being sent "for approval" even if you're a regular on Opentalk - especially if you comment on an old entry. Nothing personal, just trying to avoid providing free advertising for products I don't actually use or recommend. (comment added 2005-02-04 07:01:33) |
Associated topics are indexed under
G902 - Well House Consultants - Web site techniques, utility and visibility [2668] Is it worth it? - (2010-03-09)
[2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
[2552] Web site traffic - real users, or just noise? - (2009-12-26)
[2532] Analysing Google arrivals by country of origin - (2009-12-10)
[2519] Status Page / breaks of service in early December - (2009-11-30)
[2410] Removal of technical resources from this site - (2009-09-19)
[2389] Writing with our customers words - (2009-09-01)
[2341] Koulutus, Open Source tietokone kielillä - (2009-08-09)
[2340] ldning, Open Source dator språk - (2009-08-09)
[2339] Opplæring, Open Source datamaskinen språk - (2009-08-09)
[2338] Uddannelse, Open Source computer sprog - (2009-08-09)
[2337] Opleiding, Open Source computertalen - (2009-08-09)
[2336] Formação, Open Source computador línguas - (2009-08-09)
[2335] Ausbildung, die Open-Source-Sprachen - (2009-08-09)
[2334] Formazione, Open Source computer lingue - (2009-08-09)
[2333] Formación, de los lenguajes de código abierto - (2009-08-09)
[2332] Formation, des langages Open Source - (2009-08-09)
[2225] How important is a front page ranking on a search engine? - (2009-06-09)
[2065] Static mirroring through HTTrack, wget and others - (2009-03-03)
[2056] Web Site Loading - experiences and some solutions shared - (2009-02-26)
[1982] Cooking bodies and URLs - (2009-01-08)
[1970] Plagarism - who is copying my pages? - (2009-01-02)
[1961] Making our things easier to find - (2008-12-26)
[1955] How to avoid duplicating web page maintainance - (2008-12-20)
[1888] Find the link - (2008-11-16)
[1856] A few of my favourite things - (2008-10-26)
[1833] Web Bloopers - good form design - avoiding pitfalls - (2008-10-11)
[1797] I have been working hard but I do not expect you noticed - (2008-09-14)
[1793] Which country does a search engine think you are located in? - (2008-09-11)
[1756] Ever had One of THOSE mornings? - (2008-08-16)
[1747] Who is watching you? - (2008-08-10)
[1711] Rapid growth leads to server move - (2008-07-17)
[1653] How do Google Ads work? - (2008-05-25)
[1634] Kiss and Book - (2008-05-07)
[1630] To provide external links, or not? - (2008-05-04)
[1610] PHP course dot co, dot uk - (2008-04-13)
[1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
[1541] Colour, Composition or Content - (2008-02-16)
[1534] Where in the world / country is my visitor from? - (2008-02-07)
[1513] Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20)
[1506] Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14)
[1505] Script to present commonly used images - PHP - (2008-01-13)
[1494] A time to update pictures - (2008-01-03)
[1437] Above the fold with First Great Western - (2007-11-19)
[1297] Stuffing content into a web page - easy maintainance - (2007-08-09)
[1237] What proportion of our web traffic is robots? - (2007-06-19)
[1212] What brought YOU to our web site? - (2007-06-01)
[1207] Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27)
[1198] From Web to Web 2 - (2007-05-21)
[1186] Two new pages / sites - (2007-05-14)
[1184] Finding resources - some pointers - (2007-05-13)
[1177] Sorting out for a site map - (2007-05-05)
[1104] Drawing dynamic graphs in PHP - (2007-03-09)
[1055] Above the fold - (2007-01-28)
[1029] Our search engine placement is dropping. - (2007-01-11)
[1015] Search engine placement - long term strategy and success - (2006-12-30)
[994] Training on Cascading Style Sheets - (2006-12-17)
[976] Santa at the station - (2006-12-09)
[916] Driving customers away - (2006-11-07)
[893] Visibility - (2006-10-14)
[800] Effective web campaign? - (2006-07-12)
[767] Finding the language preference of a web site visitor - (2006-06-18)
[757] Horse and Python training - (2006-06-12)
[732] Where is a web site visitor browsing from - (2006-05-24)
[718] Protecting images from theft - (2006-05-12)
[681] Mirroring a dynamic site - (2006-04-12)
[658] Keeping the visitors happy and browsing - (2006-03-26)
[649] Denial of Service ''attack'' - (2006-03-17)
[533] Bigger Box Campaign - (2005-12-18)
[528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
[510] Dynamic Web presence - next generation web site - (2005-11-29)
[492] New Navigation Aid - Launch of My Wellho - (2005-11-11)
[414] Form Madness - (2005-08-14)
[376] What brings people to my web site? - (2005-07-13)
[369] CMS - the minefield of Choices - (2005-07-05)
[348] Graveyard pages - (2005-06-15)
[347] Frightening and from-friend viruses and spams - (2005-06-14)
[322] More maps - (2005-05-23)
[320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
[314] What language is this written in? - (2005-05-17)
[311] Growth pains - (2005-05-14)
[288] Colour blindness for web developers - (2005-04-22)
[284] The Iconish language - (2005-04-19)
[278] Cover all the options - (2005-04-13)
[276] An apology to Mr Boneparte - (2005-04-11)
[274] Our most popular resources - (2005-04-10)
[268] Information request forms, cleaning up spam - (2005-04-05)
[261] Putting a form online - (2005-03-29)
[259] Responding to spam - (2005-03-27)
[222] Who are all these visitors? - (2005-02-20)
[197] Allow for peak traffic on your web site - (2005-02-01)
[182] Your personal Google ranking - (2005-01-19)
[179] The hunt for unique words - (2005-01-16)
[173] Data Mining - (2005-01-09)
[165] Implementing an effective site search engine - (2005-01-01)
[142] Colour for access - (2004-12-06)
[117] A case of case - (2004-11-14)
[109] URLs - a service and not a hurdle - (2004-11-04)
[98] No more 'Error 404' pages. Something better. - (2004-10-24)
[32] Web design platoon - (2004-08-29)
[23] Skills and responsibilities - (2004-08-22)
Q110 - Object Orientation and General technical topics - Programming Algorithms [2617] Comparing floating point numbers - a word of caution and a solution - (2010-02-01)
[2586] And and Or illustrated by locks - (2010-01-17)
[2509] A life lesson from the accuracy of numbers in Excel and Lua - (2009-11-21)
[2259] Grouping rows for a summary report - MySQL and PHP - (2009-06-27)
[2189] Matching disparate referencing systems (MediaWiki, PHP, also Tcl) - (2009-05-19)
[1949] Nuclear Physics comes to our web site - (2008-12-17)
[1840] Validating Credit Card Numbers - (2008-10-14)
[1391] Ordnance Survey Grid Reference to Latitude / Longitude - (2007-10-14)
[1187] Updating a page strictly every minute (PHP, Perl) - (2007-05-14)
[1157] Speed Networking - a great evening and how we arranged it - (2007-04-21)
[642] How similar are two words - (2006-03-11)
[227] Bellringing and Programming and Objects and Perl - (2005-02-25)
Some other Articles
Fox and PythonPHP5 lets you say noThe confidence to allow public commentsHoles in on line informationSearching for numbers0870 telephone numbersTips for the topPost course support - part of the serviceA new skill may not be quick and easy