C2_W1 assignment: Could not understand some of the instruction info

I have finished C2_W1 assignment, but I feel difficult to understand the following content in its instruction info:


The logic of ‘get_corrections’ as its Exercise 10 said, is to choose a word with a higher probability and least edit times, and this probability here is from ‘probs’ which is P(Wi), it’s calculated in its Exercise 3. The whole process never calculates the equation as the above instruction said, I feel difficult to understand it. Can someone explain to me please? Thanks for your help in advance!

Hi @Kathy83

The goal is to compute P(c|w) using Bayes’ Rule:

P(c|w) = \frac{P(w|c) \times P(c)}{P(w)}.

  • P(w|c) represents the likelihood of w given c based on edit distance.
  • P(c) is the prior probability of c, based on word frequency in your corpus.
  • P(w) is the overall probability of w (approximated).

I don’t have access to course notebooks but, in practice, this is simplified. Since P(w) is a constant normalization factor for all possible corrections and doesn’t affect the relative probabilities, we can ignore it and focus on the numerator: P(w|c) \times P(c). Also, to avoid underflow issues when dealing with very small probabilities, we often use the log of probs.

Hope it helps! Feel free to ask if you need further assistance.

Hi @Alireza_Saei , thanks a lot for your reply! But I’m afraid you’re wrong. In this assignment, it calculates P(w) and uses it to decide the final word suggestion: All P(w) is calculated and save in ‘probs’ dictionary in Exercise 3. This ‘probs’ is used in Exercise 10 to implement the get_corrections() function. Since you can’t access the notebooks, I paste a screen shot here. So I think the whole process has nothing to do with the instruction info part of Bayes’ Rule, it’s confusing.


Since I can’t put my implementation of ‘get_corrections()’ here, I paste the code hints from Coursera for it, but you can still see its logic:

START CODE HERE

#Step 1: create suggestions as described above              
#Step 2: determine probability of suggestions  <- Here it will look in 'probs'
#Step 3: Get all your best words and return the most probable top n_suggested words as n_best

Hi @Kathy83

the get_corrections exercise 10 calculate the P(w|C) probability of having a certain word, given that it is correct in general P(C).

P(w) is the get_probs output which results in a dictionary where keys are the words and the values are the probability that a word will occur in general.

P(c|w) is calculated using minimum_edit_distance using the output of word and probability found from the get_correction and assigned source and targets where insert/delete is equal to 1 and substitution is equal to 0 or 2(NOTE this is not similar to the string manipulation calculation).

A cost matrix is created between the source and target using del_cost, ins_cost and substitution_cost(replacement_cost), matching the source to target

Then minimum edit distance cost is found between row and column, giving you the P(w|c)

Regards
DP