Hi everyone! I’ve seen that other people have had similar issues, but I couldn’t find an answer in their threads. I have a problem with these two exercises. Exercise 7 gives me the following output:
“Wrong number of unknown tokens in the test_data_replaced list. Check the unknown token value and how you are using it.
Expected: 4
Got: 0.
Wrong number of unknown tokens in the train_data_replaced list. Check the unknown token value and how you are using it.
Expected: 10
Got: 0.
Wrong number of unknown tokens in the test_data_replaced list. Check the unknown token value and how you are using it.
Expected: 7
Got: 0.
9 Tests passed
3 Tests failed”
Regarding UNQ_C7:
You do not pass it when you call replace_oov_words_by_unk inside preprocess_data() function - you rely on default value for unknown_token. To be exact: your line should be: train_data_replaced = replace_oov_words_by_unk(..., ..., unknown_token)
and not: train_data_replaced = replace_oov_words_by_unk(..., ...)
Regarding UNQ_C8 is my mistake, I did not notice, that you work with sentences (instead of sentence) variable from the start. Everything is ok with your UNQ_C8.
Regarding UNQ_C10 you should pass vacabulary_size instead of len(unique_words) (there is no unique_words in the scope) and k=k, not k=1
Regarding UNQ_C11 it may start with arbitrary length, not just first symbol (word[0]), so correct way is:
# Check if the beginning of word does not match with the letters in 'start_with'
if not word.startswith(start_with):