C3W2_Assignment - fit_vectorizer

Hi everyone,
I’ m stuck with the fit_vectorizer function in C3W2_Assignment:

![image|690x386]posting grade cell codes is against community guidelines, kindly check faq, code of conduct.

I basically build the vectorizer as described in the course and adapt it then to the train_sentences. The only new thing compared to the labs in this week is the standardize argument.

There I pass in the previous given standardize_func, i.e. standardize=standardize_func. This is possible as per the documentation.

However, when executing the fit_vectorizer function I get the following error at the vocabulary_size:

AttributeError: ‘NoneType’ object has no attribute ‘vocabulary_size’.

It seems the vectorizer does not get generated?
However, when I generate the set up the vectorizer without the standardize=standardize_func parameter, the vectorizer gets generated and has a vocab_size of 1000.
How does this function need to be passed into the standardize parameter? – What am I missing? Thank you!

hi @ps17

what about num_words and oov_token? ?

what does instructions before the grade cell mentions?

Also remember after you instantiate the tokenize class, the next code instruction in the image is mentioning you to fit the tokenizer to the training sentences.

Your error is basically mentioning you have left None for the function you are recalling to the vectorizer

1 Like

Hi @Deepti_Prasad
thanks for your fast reply. But I am still confused. Are we talking about the Vectorizer as described initially in exercise 2:

Which points to:

image

Which has no parameters num_words, or oov_token. OR are we talking about the Tokenizer which has these parameters:

Should it be the Tokenizer, then the following method .vocabulary_size cannot be applied as only the Vectorizer has it…

Adding to the confusion is that the graded function is prepopulated with the TextVectorization class but the comment talks about the Tokenizer class…

I seems to me that an old assignment has been adjusted from Tokenizer to TextVectorization. You can see hints of this by looking at the topline:

image

The t has not been deleted an mostly likely coming from fit_tokenizer…

Which brings me to my initial question; how do I fit the standardize_function into the standardize parameter of the TextVectorization class? My current suspicion is it cannot, as it was part of the old assignment involving the Tokenizer class…

Thank you for your help!

1 Like

Thank you for reporting these corrections, will notify to the staff about this.

In fact the instruction mentions in the instantiate the tokenizer class , probably required to mention instruction mention before the grade cell which you have shared, that is the vectorizer need pass the values for max token as shared by you also here in the previous comment.

The vocabulary learned by the vectorizer should have VOCAB_SIZE size, and truncate the output sequences to have MAX_LENGTH length.

Remember to use the custom function standardize_func to standardize each sentence in the vectorizer. You can do this by passing the function to the standardize parameter of TextVectorization.

Yes it makes difference between T and t if the idea recalling function calls previously were done as per instructions but the issue with your code isn’t about this.

Regards
DP

Now addressing to your code issue, I don’t know if you have referred the ungraded lab which tells you to use the same tf.keras.layers.TextVectorization as these codes were already given at the beginning of the codes.

Now comes to adding other parameters to this Tokenizer class as per instruction, instruction mentions you to use vocab_size and assign tells you to use standardise_func, vocab_size and runcate the output sequences to have MAX_LENGTH length.

Also not to forget
emember to use the custom function standardize_func to standardize each sentence in the vectorizer. You can do this by passing the function to the standardize parameter of TextVectorization.

Don’t add any other parameter to the vectorizer other than mentioned by the instructions.

then this vectorise is used to fit tokenizer to the training_sentences in the next code line.( remember the training_sentence has been recalled as train_sentences as per the arguments given in the GRADE FUNCTION: fit_vectorizer.

Extra hints would be referring ungraded labs might give where you need attention.

I am suspecting you have include a parameter=None to the vectorizer and then secondly fit tokenizer to training_sentences might need a lookup.

Let me know if you are still getting any error.

Regards
DP

A clue here is that for standardize you are to pass in the previous function (as you already know), as for max_tokens and output_sequence_length, you are to use global variables that are already defined in the notebook. Just go to the beginning of the notebook, you’ll see them.

Hope this helps.

1 Like

Hi Dp & Lukmanaj
Cool, thanks - I managed to get it work and also saw that the workbook had been updated.

1 Like