You might be asking yourself what “metaphone” is. Metaphone is a phonetic algorithm published in 1990 for indexing words by their English pronunciation. Given a word, it creates a key. Similar sounding words share the same key. Now that we’ve got a very brief introduction to what metaphone is, let’s create our dictionary lookup!
Preparing the Database
The first step to creating your own dictionary lookup is to find a dictionary. For most Linux users, this isn’t a problem! There’s one included with the OS by default, generally located in or around
/usr/share/dict/words. Since this is a flat file with words separated by newlines, we can easily import into an existing MySQL table. Let’s create that table now:
Importing the Dictionary
Our next step is to actually import the dictionary using MySQL’s
LOAD DATA LOCAL INFILE syntax. Your file may be located in a slightly different location (and also requires you be running on Linux), but the folder
/usr/share/dict/ seems to remain the same on Red Hat derivatives and Ubuntu. Take a peak at the folder to verify the filename is correct before running.
Calculating Metaphone Values
Lastly, we need to take each of the dictionary words we stored and calculate it’s metaphone value. We then need to update the database with this value so that we can use it for future lookups when comparing other words. Note that running the following code will take a fairly long time to run from the command line.
Now that you have an English dictionary with metaphone values for each word, we can use the data for a number of purposes:
- Spell checking and error correction
- Finding similar words to the one supplied
- Finding English sounding domain names (my personal use case)
For the lazy…
Here’s the database in all of it’s glory, complete with metaphone values pre-calculated for you…