I think there is a bug when predicting the probabilities of the next token when the sentence has less tokens than n (where n is the ngram-1 size).
We need to append [‘<s>’] * n at the previous_tokens. Otherwise it will get count 0, getting lower probabilities.
Hey @Vinicius_Arruda,
Welcome, and we are glad that you could become a part of our community Can you please pin-point the function in which you believe that the bug exists?
Cheers,
Elemento
In function suggest_a_word(), at line ~22.
# length of previous words
n = len(list(n_gram_counts.keys())[0])
# NEED TO ADD THIS
previous_tokens = ['<s>'] * n + previous_tokens
# From the words that the user already typed
# get the most recent 'n' words as the previous n-gram
previous_n_gram = previous_tokens[-n:]
1 Like
Hey @Vinicius_Arruda,
Thanks a lot for pointing it out. I will raise an issue regarding this with the team.
Cheers,
Elemento
1 Like
Please let me know the decision made on this. Thanks
Hey @Vinicius_Arruda,
The modification has been done. It might take some time to reflect in your assignment. Once again, thanks a lot for your contributions
Cheers,
Elemento
1 Like