C3_W2_lecture_nb_3_perplexity - dimension of predictions

In the Notebook - C3_W2_lecture_nb_3_perplexity, can someone please explain the reason for the extra dimension of “256” ?

I understand that this corresponds to the vocabulary size used but why does it show up in “predictions” and not in “targets” ? Not able to visualize this …

~ Ani

Hi @Anivader

These 256 values are the log probabilities for each element.

Targets, on the other hand, have one value - the element that is true.

For example, if a single target element value is token number 3, then the goal of your predictions vector of 256 probabilities should be as high as possible value for index 3 and as low as possible for the remaining 255 tokens.