We showed how you can build an autocorrect based on Jaccard distance by returning also the probability of each word. We will create three different spelling recommenders, that each takes a list of misspelled words and recommends a correctly spelled word for every word in the list. For every misspelled word, the recommender should find the word in
correct_spellings that has the shortest distance and starts with the same letter as the misspelled word, and return that word as a recommendation.
Note: Each of the three different recommenders will use a different distance measure.
For our example, we will consider the following misspelling words: [spleling, mispelling, reccomender]
Jaccard distance on the 2 Q-Grams of the two words
import nltk from nltk.corpus import words correct_spellings = words.words() from nltk.metrics.distance import jaccard_distance from nltk.util import ngrams from nltk.metrics.distance import edit_distance
Since we loaded the libraries, let’s work on the function. We will work with list comprehensions.
entries=['spleling', 'mispelling', 'reccomender'] for entry in entries: temp = [(jaccard_distance(set(ngrams(entry, 2)), set(ngrams(w, 2))),w) for w in correct_spellings if w==entry] print(sorted(temp, key = lambda val:val))
And we get:
spelling misspelling recommender
Now, we will work with the Edit Distance
for entry in entries: temp = [(edit_distance(entry, w),w) for w in correct_spellings if w==entry] print(sorted(temp, key = lambda val:val))
and we get:
selling misspelling recommender