If I'm searching for a 50kg bag of cement, and the online store only offers 48kg bags, will their search engine find this product and say "is this what you want"?
Our own site searches do clever things with alphabetic searches but we're rarely had to do a "near number" hunt on our own behalf ... but we have for client sites.
Is 48 near to 50? Yes. Is 8 near to 10? Maybe, but not so near. Is 1 near to 3? No - almost certainly not. So you can't rely just on difference - indeed 93 is nearer to 100 that 1 is to 3 and the difference is much more.
Algorithm 1.
Let "$h" be the value you have and "$t" being the value you're testing. Then the nearness factor is defined as
abs( ($h + $t) / ($h - $t))
with the larger number being the closest. On this algorithm, an infinite result tells you that two values are numerically identical (so you had better extract that special case first), and higher numbers indicate better matches. Let's see some example factors:
48 and 50 - factor is 49
8 and 10 - factor is 9
1 and 3 - factor is 2
93 and 100 - factor is 27.57
Here's Perl code for searching (yes, we have
Perl search training) to work this our:
#!/usr/bin/perl
if ($ARGV[0] == $ARGV[1]) {
print "Parameters are numerically identical\n";
} else {
printf ("factor is %.2f for %s and %s\n",
abs(($ARGV[0]+$ARGV[1])/($ARGV[0]-$ARGV[1])),
$ARGV[0], $ARGV[1]);
}
Algorithm 2
The algorithm above isn't always ideal. If you're searching for phone numbers, for example, it's not helpful. If you've transposed digits values, you'll want to score hits on values that are numerically very different. For this, you'll want to use someting like a
Levenshtein distance algorithm. We talk further about this on our web site in the
Solutions Centre
(written 2005-02-04 06:06:02)
| Commentator | says ... | | Graham: | This is an offtopic comment. The software running "The Horse's Mouth" has just been upgraded to block bulk automated offtopic contributions. You may now find your comments are noted as being sent "for approval" even if you're a regular on Opentalk - especially if you comment on an old entry. Nothing personal, just trying to avoid providing free advertising for products I don't actually use or recommend. (comment added 2005-02-04 07:01:33) |
Associated topics are indexed under
G902 - Well House Consultants - Web site techniques, utility and visibilityQ110 - Object Orientation and General technical topics - Programming Algorithms
Some other Articles
Fox and PythonPHP5 lets you say noThe confidence to allow public commentsHoles in on line informationSearching for numbers0870 telephone numbersTips for the topPost course support - part of the serviceA new skill may not be quick and easyAllow for peak traffic on your web site