Bug in inference for the assignment 3?

I think there is a bug when predicting the probabilities of the next token when the sentence has less tokens than n (where n is the ngram-1 size).
We need to append [‘<s>’] * n at the previous_tokens. Otherwise it will get count 0, getting lower probabilities.

Hey @Vinicius_Arruda,
Welcome, and we are glad that you could become a part of our community :partying_face: Can you please pin-point the function in which you believe that the bug exists?

Cheers,
Elemento

In function suggest_a_word(), at line ~22.

    # length of previous words
    n = len(list(n_gram_counts.keys())[0]) 

    # NEED TO ADD THIS
    previous_tokens = ['<s>'] * n + previous_tokens

    # From the words that the user already typed
    # get the most recent 'n' words as the previous n-gram
    previous_n_gram = previous_tokens[-n:]
1 Like

Hey @Vinicius_Arruda,
Thanks a lot for pointing it out. I will raise an issue regarding this with the team.

Cheers,
Elemento

1 Like

Please let me know the decision made on this. Thanks

Hey @Vinicius_Arruda,
The modification has been done. It might take some time to reflect in your assignment. Once again, thanks a lot for your contributions :nerd_face:

Cheers,
Elemento

1 Like