Something is bothering me in the alogithm for the auto correct:
We are told that “The idea is that words generated from fewer edits are more likely than words with more edits.”
But what if, for example, there is a suggested word from “edit_one_letter” with a very low probability, and a suggested word from “edit_two_letters”, with a very high probability?
According to my intuition, the suggestion with the high probabilithy should be selected, but being from the category of “edit_two_letters” it would not even be considered as candidate, because the suggested from “edit_one_letter” would be prioritized.
What do you thinlk?
Hi @Doron_Modan
We usually consider candidates from multiple edit distances, validate them against real words, and then choose the one with the highest frequency or probability. This way, even if a word from “edit_two_letters” has a higher likelihood, it can still be selected if it’s the best fit.
The intent here is to convey that fewer edits generally indicate a closer match to the misspelled word (more edits might change the word significantly).
Hope it helps! Feel free to ask if you need further assistance.
You mean in real life, right? Please correct me if I’m wrong, in this exercise, it wasn’t the case. (Or was it?)
Both in this course and the real life. The course explains the auto correct algorithm steps as below:
you have to follow these steps:
1. Identify a misspelled word
2. Find strings n edit distance away: (these could be random strings)
3. Filter candidates: (keep only the real words from the previous steps)
4. Calculate word probabilities: (choose the word that is most likely to occur in that context)
and with these explanations, the algorithm chooses the most probable word using word frequencies.
Thanks for your answer. Still, by prioritizing 1-edit distance suggestions over 2-edit distance suggestions, in case there are 1-edit candidates, as I unerstand, 2-edit suggestions are not even considered candidates (at least on UNQ_C10 get_corrections), though they may have a higher probability than the some 1-edit distance suggestions.
Could you please share the UNQ_C10 get_corrections
with me via private message so I can provide specific assistance?