C3_W1_Assignment: LayerError: Exception passing through layer WeightedCategoryCrossEntropy

Hi All,

So I’m struggling with C3_W1_Assignment. All my unit tests are passing up until the training section, where I get variations of this “LayerError: Exception passing through layer WeightedCategoryCrossEntropy” issue. I’ve searched the forums and seen other suggested approaches, and I’ve tried them all and have adjusted the nature of my error but not been able to fix it.

Many suggestions are around the definiton of the Mean layer and specifying the axis. The consensus seems to be that it should be:

mean_layer = tl.Mean(axis=1)

However with this I get the following variation:

TypeError: mul got incompatible shapes for broadcasting: (16,), (4,).

If I add in keepdims=True as some have suggested I get this error:

ValueError: Incompatible shapes for broadcasting: ((16, 16), (1, 4))

I understand that the model structure we have created is as follows:

Serial[
Embedding_9088_256
Mean
Dense_2
LogSoftmax
]

and that we are operating in batches, meaning that we are passing 2(?)D tensors through the system (batch_size * max_sentence length)

Is there perhaps a way of printint out the structure that gives more details on the layers? By adding the axis=1 to Mean I have moved the error from the Dense layer to the WeightedCategoryCrossEntropy which I thought might have been the LogSoftmax layer, but actually seems to be part of the loss layer.

I’d struggling to debug this. Our batch size is 16, so I guess that’s where the 16s are coming from, but not sure about the 4 …

Any help very much appreciated

I’ve made this diagram to try and better understand the shape of the tensors going through the layers in a single pass and a batch setup. This is speciifc numbers as per what I understand from the assignment, i.e.

  • batch size is 16
  • the vocabulary size is 9088
  • the embedding dimension is 256
  • the output dimension is 2

I’d really like to be able to breakpoint on the flow through the network and check the shape of the tensors at each point, but it feels unclear where those breakpoints would go - maybe in the bowels of trax somewhere?

Now I know we’re not supposed to modify the training setup, but just as an experiment I redefined the train_eval etc. with a batch size of 4 and training worked:

train_task, eval_task = get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, Vocab, True, batch_size = 4)

Very strange …