Vocabulary size differs from your preloaded V dimension of W1,W2 and b2 for testing

Antonino_Staiano · December 21, 2022, 7:10am

After preprocessing the corpus, I get a vocabulary size of 5775, while the V dimension of your preloaded parameters in unit_test functions is 5778.
That leads to failing several unit_test functions.

davidguo94 · December 21, 2022, 4:23pm

How did you preprocess the corpus? You should be getting 5778 for the vocab size.

Antonino_Staiano · December 21, 2022, 6:22pm

I just run your notebook code.
Cheers
Antonino

davidguo94 · December 21, 2022, 11:42pm

Thats strange. Did it start like this? Maybe you can restart the kernel?

Antonino_Staiano · December 22, 2022, 7:11am

I’ve restarted the kernel, and nothing has changed. After the preprocessing, I get 60976 tokens. Is it your number? Then computing fdist = nltk.FreqDist(word for word in data), I get 5775 unique words.

Asim_Naeem · March 27, 2023, 3:10am

I am facing the similar issue. My vocabulary size is 5775 instead of 5778

Topic		Replies	Views
Exercice 3 and exercice 4 NLP with Attention Models week-module-3	2	610	January 24, 2024
W2 Assignment clarification - "vocab_size" NLP with Sequence Models week-module-2	4	521	September 19, 2022
Test vocab size mismatch for Exercise 1 Updated NER assignment NLP with Sequence Models week-module-2	10	439	January 18, 2024
Exercise 8 - count_n_grams NLP with Probabilistic Models week-module-3	1	341	November 16, 2023
Train data vocabulary mismatch NLP with Probabilistic Models week-module-3	2	490	March 1, 2023

Vocabulary size differs from your preloaded V dimension of W1,W2 and b2 for testing

Related topics