C3W2_Assignment (TensorFlow course, BBC dataset) and model fitting

I hope I posted this to the correct category.

This is for the ‘TensorFlow Developer Professional Certificate’, Course 3, Week 2 assignment.

So I got functions train_val_split(), fit_tokenizer(), seq_and_pad(), tokenize_labels() working, and all of these show the expected output.

But, I seem to struggle with the model fitting.

So, I created my model as such:

Model: "sequential_19"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_19 (Embedding)    (None, 120, 16)           16000     
                                                                 
 global_average_pooling1d_10  (None, 16)               0         
  (GlobalAveragePooling1D)                                       
                                                                 
 dense_27 (Dense)            (None, 16)                272       
                                                                 
 dense_28 (Dense)            (None, 5)                 85        
                                                                 
=================================================================
Total params: 16,357
Trainable params: 16,357
Non-trainable params: 0

but it fails, because the subsequent layers after embedding do not get the data in the correct dimensions:
ValueError: Shapes (None, 1) and (None, 5) are incompatible

I noticed that the embedding layer does not produce the 16-dimension output, no matter what I do.

I verified that my variables and input arguments are the correct size and correct type:

train_padded_seq: 1780; <class 'numpy.ndarray'>
train_label_seq: 1780; <class 'numpy.ndarray'>
val_padded_seq: 445; <class 'numpy.ndarray'>
val_label_seq: 445; <class 'numpy.ndarray'>

What am I doing wrong?

Funny thing is, when I change the last Dense layer to have only 1 neuron in it, the training will succeed, but of course I am getting garbage at the output.

Hi ha5dzs,

I had the exact same issue with my assignment, but I was able to fix it by re-checking and re-thinking my choice of loss function. Think about the difference between ‘binary’ and ‘categorical’ and relate to the error message you are getting :slight_smile:

regards,
Michael

Hi,

Well, that’s what I thought at first, but no, my optimizer, loss function and the activation pattern matches with categorical data.

For a laugh, I submitted the assignment with the non-working network, and still got 80%, so carried on with next week.

I thought I had a similar issue with C3W3 as well, but that turned out to be something I forgot to do with the labels.

These error messages are very cryptic.

Mind the difference between sparse_caegorical_entropy and categorical_entropy

I do think that the loss function should be sparse categorical entropy and not categorical entropy as we dont use one hot encoding here I believe
Please refer to the link below for where to use sparse categorical and where to use categorical entropy loss

Hope this helps

Thanks and Regards,

Mayank Ghogale

2 Likes

Yep, this did the trick. When I set my loss function to 'sparse_categorical_crossentropy', everything suddenly works.

It would be nice to have a development environment that shows what are the options. Oh well, thanks for this!

1 Like

I am glad it worked for you:)

I’m having the same issue even though I’m using sparse_categorical_crossentropy. Any other suggestions?

ValueError: Data cardinality is ambiguous:
x sizes: 1780
y sizes: 445
Make sure all arrays contain the same number of samples.

Sir your X and Y arrays are not of the same size…that is you have less number of labels than the number of samples
Can you send me your code as pdf form on pm by clicking on my name
Thanks

Thank you @MayankGhogale ! I ran into the same error, replaced ‘categorical_crossentropy’ with ‘sparse_categorical_crossentropy’ and suddenly it worked. :+1:

Lesson learned: use ‘sparse_categorical_crossentropy’ and avoid one-hot encoding (which is simple but breaks the flow for submission.)

Glad it worked for you sir…

I have an understanding of categorical_crossentropy vs sparse_categorical_crossentropy, so I got the (None, 1) and (None, 5) error when I used categorical_crossentropy, but correctly using sparse_categorical_crossentropy got me another huge and very cryptic error about other sizes not matching up. I scoured through my notebook and realized that the test cases are not quite exhaustive enough to let me know that I had implemented a previous function incorrectly. I did not use all of the given function parameters when instantiating the Tokenizer. This oversight on my part did not manifest until trying to fit the model.
If you’re having errors in spite of using sparse_categorical_crossentropy, go through your previous implementations of functions in the notebook and be very careful to use all of the given parameters.
Or maybe I’m the only one who made this mistake…

This is very frustrating because this loss function, ‘SparseCategoricalCrossentropy’, was never presented to us. It was not even mentioned in the lectures.
I thought that the labels fed to the network had to be in the same format of the softmax output layer.

1 Like