Spelling Recommender with NLTK
Want to share your content on python-bloggers? click here.
Spelling Recommender
We showed how you can build an autocorrect based on Jaccard distance by returning also the probability of each word. We will create three different spelling recommenders, that each takes a list of misspelled words and recommends a correctly spelled word for every word in the list. For every misspelled word, the recommender should find the word in correct_spellings
that has the shortest distance and starts with the same letter as the misspelled word, and return that word as a recommendation.
Note: Each of the three different recommenders will use a different distance measure.
For our example, we will consider the following misspelling words: [spleling, mispelling, reccomender]
Jaccard distance on the 2 Q-Grams of the two words
import nltk from nltk.corpus import words correct_spellings = words.words() from nltk.metrics.distance import jaccard_distance from nltk.util import ngrams from nltk.metrics.distance import edit_distance
Since we loaded the libraries, let’s work on the function. We will work with list comprehensions.
entries=['spleling', 'mispelling', 'reccomender'] for entry in entries: temp = [(jaccard_distance(set(ngrams(entry, 2)), set(ngrams(w, 2))),w) for w in correct_spellings if w[0]==entry[0]] print(sorted(temp, key = lambda val:val[0])[0][1])
And we get:
spelling misspelling recommender
Edit Distance
Now, we will work with the Edit Distance
for entry in entries: temp = [(edit_distance(entry, w),w) for w in correct_spellings if w[0]==entry[0]] print(sorted(temp, key = lambda val:val[0])[0][1])
and we get:
selling misspelling recommender
Want to share your content on python-bloggers? click here.