UNQ_C10 get_corrections - doubt about the algorithm

Doron_Modan · August 11, 2024, 7:20pm

Something is bothering me in the alogithm for the auto correct:
We are told that “The idea is that words generated from fewer edits are more likely than words with more edits.”
But what if, for example, there is a suggested word from “edit_one_letter” with a very low probability, and a suggested word from “edit_two_letters”, with a very high probability?
According to my intuition, the suggestion with the high probabilithy should be selected, but being from the category of “edit_two_letters” it would not even be considered as candidate, because the suggested from “edit_one_letter” would be prioritized.
What do you thinlk?

Alireza_Saei · August 12, 2024, 7:03am

Hi @Doron_Modan

We usually consider candidates from multiple edit distances, validate them against real words, and then choose the one with the highest frequency or probability. This way, even if a word from “edit_two_letters” has a higher likelihood, it can still be selected if it’s the best fit.

The intent here is to convey that fewer edits generally indicate a closer match to the misspelled word (more edits might change the word significantly).

Hope it helps! Feel free to ask if you need further assistance.

Doron_Modan · August 12, 2024, 8:03am

You mean in real life, right? Please correct me if I’m wrong, in this exercise, it wasn’t the case. (Or was it?)

Alireza_Saei · August 12, 2024, 6:03pm

Both in this course and the real life. The course explains the auto correct algorithm steps as below:

you have to follow these steps: 

1. Identify a misspelled word

2. Find strings n edit distance away: (these could be random strings)

3. Filter candidates: (keep only the real words from the previous steps)

4. Calculate word probabilities: (choose the word that is most likely to occur in that context)

and with these explanations, the algorithm chooses the most probable word using word frequencies.

Doron_Modan · August 12, 2024, 8:00pm

Thanks for your answer. Still, by prioritizing 1-edit distance suggestions over 2-edit distance suggestions, in case there are 1-edit candidates, as I unerstand, 2-edit suggestions are not even considered candidates (at least on UNQ_C10 get_corrections), though they may have a higher probability than the some 1-edit distance suggestions.

Alireza_Saei · August 12, 2024, 8:32pm

Could you please share the UNQ_C10 get_corrections with me via private message so I can provide specific assistance?

Topic		Replies	Views
Doubt in concept of minimum edit distance NLP with Probabilistic Models week-module-1	4	519	March 17, 2023
Small Typos in the C2 W1 Assignment NLP with Probabilistic Models week-module-1	1	559	December 18, 2022
What's the usage of minimum edit distance? NLP with Probabilistic Models week-module-1	12	728	July 27, 2023
C2_W1 assignment: Could not understand some of the instruction info NLP with Probabilistic Models week-module-1	6	78	July 9, 2024
UNQ_C10: I have trouble understanding the concept NLP with Probabilistic Models week-module-1	1	409	July 18, 2023

UNQ_C10 get_corrections - doubt about the algorithm

Related topics