I am getting confused with the RNN dimension and how the perplexity is calculated in term of dimension.
Can anyone guide me or specify some video that I should watch. Feel free to suggest youtube video as well.
Could you please be more specific what is not clear to you?
Regarding the output dimensions there is a similar question:
Perplexity has directly nothing to do with model dimensions. Perplexity is just the method to evaluate if your model is doing good.
For example, your model could be just taking random words from vocabulary and joining them - that wouldn’t be a great model but how would you “measure” it? Perplexity is one of the ways to measure. Check the lesson videos again and try to understand.
There are many online explanations (just try “nlp perplexity” keywords and you would get myriad of results) and it’s not easy to pick one example because it depends on you personally - how well you understand the subject, what is your background etc.
Cheers
I am confused about the 3 dimension mentioned in the notebook - ’ C3_W2_lecture_nb_3_perplexity’. I am getting we are need 3 dimensions to decide which words has the max probability.
But, I am not sure about how the matrix conversions are been done in RNN to get output as 3 dimension output matrix.
Taking the below example,
Serial[
Serial[
ShiftRight(1)
]
Embedding_256_512
GRU_512
GRU_512
Dense_256
LogSoftmax
]
Batch Size: 512
Word Vector: 256
Input layer: 512X 256
Embedding Layer: 256X512
GRU: 512X512
GRU: 512X512
Dense Layer: 512X256
Softmax: 512X256 (Issue: I don’t have prob. for various words, so confused how the matrix manipulation is done to get the 3 dimension )
Your example would go like this:
- Inputs shape (32, 256) - (batch size, sequence length padded/trimmed)
- Embedding layer - shape change - (32, 256, 512) - (batch size, sentence length, embedding dim)
- GRU1 - no shape change - (32, 256, 512)
- GRU2 - no shape change - (32, 256, 512)
- Dense - shape change - (32, 256, 256) - (batch size, sentence length, vocab size)
- Softmax - no shape change - (32, 256, 256)
So the model outputs (32, 256, 256). That is 32 various sentences with max 256 chars and 256 probabilities for the next char. You check which char of the 256 choices has max value - most probable char - the models prediction.
That was the RNN part. Now, the perplexity is a proxy number how well the model is doing. It is not directly dependent on model dimensions.
Cheers
P.S. I realized this was character prediction and corrected my dimensions.