Week 1: Sampling in Dinosaur Language Model

Hi there,
I do not get why y is two-dimensional. Shouldn’t it be the probability distribution over the 27 characters? Where does the second dimension, in this case 100, come from? How does it affect the sampling?

Thanks in advance!

Best,
Claudia

The two dimensions are the letter sample and the one-hot coding for the possible letters.

I am sorry, I still do not get it. :confused:

In every time step I get a matrix y with dimensions (27,100). Each of the columns has entries between 0 and 1 which add up to 1. So I guess each column refers to a probability distribution over the 27 different characters. In the first time step, all columns are the same. Afterwards, the columns differ from each other. Which column do I choose for the sampling and why? I still cannot make any sense of the second dimension.

Found my mistake! It was a problem with dimensions of x and a (for anyone who might have the same issue). Now y has dimensions (27,1).