Problem with Ex 7 and 10 in the final assignment

Hi everyone! I’ve seen that other people have had similar issues, but I couldn’t find an answer in their threads. I have a problem with these two exercises. Exercise 7 gives me the following output:

“Wrong number of unknown tokens in the test_data_replaced list. Check the unknown token value and how you are using it.
Expected: 4
Got: 0.
Wrong number of unknown tokens in the train_data_replaced list. Check the unknown token value and how you are using it.
Expected: 10
Got: 0.
Wrong number of unknown tokens in the test_data_replaced list. Check the unknown token value and how you are using it.
Expected: 7
Got: 0.
9 Tests passed
3 Tests failed”

Exercise 10 gives the following:

Wrong perplexity value.
Expected: 6.137396479150367
Got: 2.943288481048362.
Wrong perplexity value.
Expected: 5.0931554910158665
Got: 2.5947982560418383.
2 Tests passed
2 Tests failed

Eventually I get 0/10 for both these exercises, and I fail the assignment.

Can anyone advise? Thank you so much!

Marco

PS: I attached the script, in case someone wanted to take a look at it.

1 Like

Same issue, commenting to be notified of a solution…

1 Like

Hi @Marco_C

In UNQ_C7 you do not pass any value unknown_token parameter for replace_oov_words_by_unk function.

For UNQ_C8 - n_gram from i to i+n should be from sentence variable (not sentences)

Also UNQ_C11 # Check if the beginning of word does not match with the letters in ‘start_with’ should be if not word.startswith(start_with)

First correct these errors and see if you pass.

Please remove your Assignment code because it’s against the rules to share your code with everyone.

2 Likes

Hi @arvyzukai,

thanks for your answer. I tried with your suggestions:

  1. Q7: what do you mean? I pass it it in the preprocess_data function as unknown_token=“”;
  2. Q8: if I change sentences to sentence, it won’t work anymore, while right now it works;
  3. Q11: this one was already working, and changing it your way gives the same result.

What about Q10?

I removed my script, but I’ll send it to you by private message.

Thanks again,

Marco

1 Like

Regarding UNQ_C7:
You do not pass it when you call replace_oov_words_by_unk inside preprocess_data() function - you rely on default value for unknown_token. To be exact: your line should be:
train_data_replaced = replace_oov_words_by_unk(..., ..., unknown_token)
and not:
train_data_replaced = replace_oov_words_by_unk(..., ...)

Regarding UNQ_C8 is my mistake, I did not notice, that you work with sentences (instead of sentence) variable from the start. Everything is ok with your UNQ_C8.

Regarding UNQ_C10 you should pass vacabulary_size instead of len(unique_words) (there is no unique_words in the scope) and k=k, not k=1

Regarding UNQ_C11 it may start with arbitrary length, not just first symbol (word[0]), so correct way is:

            # Check if the beginning of word does not match with the letters in 'start_with'
            if not word.startswith(start_with):
4 Likes

Thank you very much, @arvyzukai, that was very helpful! I passed the assignment now!

Marco

1 Like

No problem. :slight_smile: Please help out @Sarai_Pahla_MD if she will not be able to correct her mistakes on the same issue.

2 Likes

Thank you so much for the tips!! I was relying on the default value for replace_oov_words_by_unk :sweat_smile: I am so grateful for this community! :clap::clap:

1 Like