Do you want to help your web site user find what he's looking for on your web site, even if he mis-spells a name or word in a search? PHP provides you with three facilities - soundex, metaphones and Levenshtein distance calculations - which let you compare two words and see how similar they are when written (levenshtein) or spoken (metaphone, soundex).
I've put a demonstration up for you to try -
it's here - using metaphones and levenshtein - here's the "engine" at the heart of the code:
$ident = levenshtein($first,$second);
$meta1 = metaphone($first);
$meta2 = metaphone($second);
if ($ident) {
print "Words are $ident levenshtein steps out<br>";
if ($meta1 == $meta2) {
print "But they sound the same (metaphone $meta1)\n";
} else {
$id = levenshtein($meta1,$meta2);
print "They sound different too - metaphones ";
print "$meta1 and $meta2 are $id steps out\n";
}
} else {
print "Words are identical\n";
}
The
complete source code is available too if you want to get in deep.
Having learnt how to see if two words are similar, you'll want to know how to make lots of comparisons against a single word when you're writing a search algorithm. That's another day's story perhaps, but it's something that we do as a matter of routine by keeping a database table of metaphones ....
(written 2006-03-11, updated 2006-06-05)
Associated topics are indexed under
H107 - String Handling in PHP [3534] Learning to program in PHP - Regular Expression and Associative Array examples - (2011-12-01)
[3516] Regular Expression modifiers in PHP - summary table - (2011-11-12)
[3515] PHP - moving from ereg to preg for regular expressions - (2011-11-11)
[3424] Divide 10000 by 17. Do you get 588.235294117647, 588.24 or 588? - Ruby and PHP - (2011-09-08)
[3020] Handling (expanding) tabs in PHP - (2010-10-29)
[2629] Curly braces within double quoted strings in PHP - (2010-02-09)
[2238] Handling nasty characters - Perl, PHP, Python, Tcl, Lua - (2009-06-14)
[2165] Making Regular Expressions easy to read and maintain - (2009-05-10)
[2046] Finding variations on a surname - (2009-02-17)
[1799] Regular Expressions in PHP - (2008-09-16)
[1613] Regular expression for 6 digits OR 25 digits - (2008-04-16)
[1603] Do not SHOUT and do not whisper - (2008-04-06)
[1533] Short and sweet and sticky - PHP form input - (2008-02-06)
[1372] A taster PHP expression ... - (2007-09-30)
[1336] Ignore case in Regular Expression - (2007-09-08)
[1195] Regular Express Primer - (2007-05-20)
[1058] PHP Regular expression to extrtact link and text - (2007-01-31)
[1008] Date conversion - PHP - (2006-12-26)
[728] Looking ahead and behind in a Regular Expression - (2006-05-22)
[716] Evaluating arithmetic expressions in configuration files - (2006-05-10)
[608] Don't expose your regular expressions - (2006-02-15)
[589] Robust PHP user inputs - (2006-02-03)
[574] PHP - dividing a string up into pieces - (2006-01-23)
[560] The fencepost problem - (2006-01-10)
[558] Converting between acres and hectares - (2006-01-08)
[493] Running a Perl script within a PHP page - (2005-11-12)
[463] Splitting the difference - (2005-10-13)
[422] PHP Magic Quotes - (2005-08-22)
[337] the array returned by preg_match_all - (2005-06-06)
[54] PHP and natural sorting - (2004-09-19)
[31] Here documents - (2004-08-28)
Q110 - Object Orientation and General technical topics - Programming Algorithms [3451] Why would you want to use a Perl hash? - (2011-09-20)
[3102] AND and OR operators - what is the difference between logical and bitwise varieties? - (2010-12-24)
[3093] How many toilet rolls - hotel inventory and useage - (2010-12-18)
[3072] Finding elements common to many lists / arrays - (2010-11-26)
[3042] Least Common Ancestor - what is it, and a Least Common Ancestor algorithm implemented in Perl - (2010-11-11)
[2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
[2951] Lots of way of converting 3 letter month abbreviations to numbers - (2010-09-10)
[2894] Sorting people by their names - (2010-07-29)
[2617] Comparing floating point numbers - a word of caution and a solution - (2010-02-01)
[2586] And and Or illustrated by locks - (2010-01-17)
[2509] A life lesson from the accuracy of numbers in Excel and Lua - (2009-11-21)
[2259] Grouping rows for a summary report - MySQL and PHP - (2009-06-27)
[2189] Matching disparate referencing systems (MediaWiki, PHP, also Tcl) - (2009-05-19)
[1949] Nuclear Physics comes to our web site - (2008-12-17)
[1840] Validating Credit Card Numbers - (2008-10-14)
[1391] Ordnance Survey Grid Reference to Latitude / Longitude - (2007-10-14)
[1187] Updating a page strictly every minute (PHP, Perl) - (2007-05-14)
[1157] Speed Networking - a great evening and how we arranged it - (2007-04-21)
[227] Bellringing and Programming and Objects and Perl - (2005-02-25)
[202] Searching for numbers - (2005-02-04)
W603 - Web and Intranet - Server Side Technologies [2282] Checking robots.txt from Python - (2009-07-12)
[2055] Effect on server when memory runs out and swapping starts - (2009-02-26)
[1749] Using server side and client side programming together - (2008-08-11)
[1615] PHP training courses every month - (2008-04-18)
[1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
[1365] Korn Shell scripts on the web - (2007-09-25)
[1355] .php or .html extension? Morally Static Pages - (2007-09-17)
[1031] robots.txt - a clue to hidden pages? - (2007-01-13)
[1020] Parallel processing in PHP - (2007-01-03)
[732] Where is a web site visitor browsing from - (2006-05-24)
[653] Easy feed! - (2006-03-21)
Some other Articles
PHP - London course, Melksham Course, Evening courseLost CamelUsing a MySQL database from PerlIf it's Sunday, it must be BedwynHow similar are two wordsSimple but rugged form handling demoTraining Centre PicturesProgress bars and other dynamic reportsA pile of sand? Where do we stand?Carnival