Spelling Recommender with NLTK

[This article was first published on Python – Predictive Hacks, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Spelling Recommender

We showed how you can build an autocorrect based on Jaccard distance by returning also the probability of each word. We will create three different spelling recommenders, that each takes a list of misspelled words and recommends a correctly spelled word for every word in the list. For every misspelled word, the recommender should find the word in correct_spellings that has the shortest distance and starts with the same letter as the misspelled word, and return that word as a recommendation.

Note: Each of the three different recommenders will use a different distance measure.

For our example, we will consider the following misspelling words: [spleling, mispelling, reccomender]


Jaccard distance on the 2 Q-Grams of the two words


import nltk
from nltk.corpus import words

correct_spellings = words.words()



from nltk.metrics.distance import jaccard_distance
from nltk.util import ngrams
from nltk.metrics.distance  import edit_distance
 

Since we loaded the libraries, let’s work on the function. We will work with list comprehensions.

entries=['spleling', 'mispelling', 'reccomender']

for entry in entries:
    temp = [(jaccard_distance(set(ngrams(entry, 2)), set(ngrams(w, 2))),w) for w in correct_spellings if w[0]==entry[0]]
    print(sorted(temp, key = lambda val:val[0])[0][1])

And we get:

spelling
misspelling
recommender

Edit Distance

Now, we will work with the Edit Distance


for entry in entries:
    temp = [(edit_distance(entry, w),w) for w in correct_spellings if w[0]==entry[0]]
    print(sorted(temp, key = lambda val:val[0])[0][1])
 

and we get:

selling
misspelling
recommender

To leave a comment for the author, please follow the link and comment on their blog: Python – Predictive Hacks.

Want to share your content on python-bloggers? click here.