C3_W1_Assignment: LayerError: Exception passing through layer WeightedCategoryCrossEntropy

tansaku · June 18, 2023, 10:11am

Hi All,

So I’m struggling with C3_W1_Assignment. All my unit tests are passing up until the training section, where I get variations of this “LayerError: Exception passing through layer WeightedCategoryCrossEntropy” issue. I’ve searched the forums and seen other suggested approaches, and I’ve tried them all and have adjusted the nature of my error but not been able to fix it.

Many suggestions are around the definiton of the Mean layer and specifying the axis. The consensus seems to be that it should be:

mean_layer = tl.Mean(axis=1)

However with this I get the following variation:

TypeError: mul got incompatible shapes for broadcasting: (16,), (4,).

If I add in keepdims=True as some have suggested I get this error:

ValueError: Incompatible shapes for broadcasting: ((16, 16), (1, 4))

I understand that the model structure we have created is as follows:

Serial[
Embedding_9088_256
Mean
Dense_2
LogSoftmax
]

and that we are operating in batches, meaning that we are passing 2(?)D tensors through the system (batch_size * max_sentence length)

Is there perhaps a way of printint out the structure that gives more details on the layers? By adding the axis=1 to Mean I have moved the error from the Dense layer to the WeightedCategoryCrossEntropy which I thought might have been the LogSoftmax layer, but actually seems to be part of the loss layer.

I’d struggling to debug this. Our batch size is 16, so I guess that’s where the 16s are coming from, but not sure about the 4 …

Any help very much appreciated

tansaku · June 18, 2023, 11:52am

I’ve made this diagram to try and better understand the shape of the tensors going through the layers in a single pass and a batch setup. This is speciifc numbers as per what I understand from the assignment, i.e.

batch size is 16
the vocabulary size is 9088
the embedding dimension is 256
the output dimension is 2

I’d really like to be able to breakpoint on the flow through the network and check the shape of the tensors at each point, but it feels unclear where those breakpoints would go - maybe in the bowels of trax somewhere?

tansaku · June 19, 2023, 8:40pm

Now I know we’re not supposed to modify the training setup, but just as an experiment I redefined the train_eval etc. with a batch size of 4 and training worked:

train_task, eval_task = get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, Vocab, True, batch_size = 4)

Very strange …

Topic		Replies	Views
C3W1 assignment exercise 6: incompatible shapes for broadcasting: (8, 2), (40, 2) NLP with Sequence Models week-1	9	669	May 7, 2022
C3W3 - Assignment - Error on fit Natural Language Processing in TensorFlow week-3	3	39	December 20, 2023
C4W1_assignment - Class Encoder NLP with Attention Models week-1	3	388	February 22, 2024
C3_W4 UNQ_C5 : problem with loading the weights NLP with Sequence Models week-4	10	741	October 25, 2023
[solved] C3W3_Assignment - trouble with the Embedding layer Natural Language Processing in TensorFlow	1	518	March 23, 2022

C3_W1_Assignment: LayerError: Exception passing through layer WeightedCategoryCrossEntropy

Related topics