In the Notebook - *C3_W2_lecture_nb_3_perplexity*, can someone please explain **the reason** for the extra dimension of “**256**” ?

I understand that this corresponds to the vocabulary size used but why does it show up in “*predictions*” and not in “*targets*” ? Not able to visualize this …

~ Ani

Hi @Anivader

These 256 values are the **log probabilities for each element**.

Targets, on the other hand, have **one value** - the element that is true.

For example, if a single target element value is token number 3, then the goal of your predictions vector of 256 probabilities should be as high as possible value for *index* 3 and as low as possible for the remaining 255 tokens.