RNN dimensions for hidden state and output

Sonu_Chhabra · January 7, 2023, 11:17pm

I am getting confused with the RNN dimension and how the perplexity is calculated in term of dimension.
Can anyone guide me or specify some video that I should watch. Feel free to suggest youtube video as well.

arvyzukai · January 12, 2023, 7:29am

Hi @Sonu_Chhabra

Could you please be more specific what is not clear to you?

Regarding the output dimensions there is a similar question:

Perplexity has directly nothing to do with model dimensions. Perplexity is just the method to evaluate if your model is doing good.

For example, your model could be just taking random words from vocabulary and joining them - that wouldn’t be a great model but how would you “measure” it? Perplexity is one of the ways to measure. Check the lesson videos again and try to understand.

There are many online explanations (just try “nlp perplexity” keywords and you would get myriad of results) and it’s not easy to pick one example because it depends on you personally - how well you understand the subject, what is your background etc.

Cheers

Sonu_Chhabra · January 13, 2023, 3:58am

I am confused about the 3 dimension mentioned in the notebook - ’ C3_W2_lecture_nb_3_perplexity’. I am getting we are need 3 dimensions to decide which words has the max probability.
But, I am not sure about how the matrix conversions are been done in RNN to get output as 3 dimension output matrix.

Taking the below example,

Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]

Batch Size: 512
Word Vector: 256

Input layer: 512X 256
Embedding Layer: 256X512
GRU: 512X512
GRU: 512X512
Dense Layer: 512X256
Softmax: 512X256 (Issue: I don’t have prob. for various words, so confused how the matrix manipulation is done to get the 3 dimension )

arvyzukai · January 13, 2023, 6:34am

Hi @Sonu_Chhabra

Your example would go like this:

Inputs shape (32, 256) - (batch size, sequence length padded/trimmed)
Embedding layer - shape change - (32, 256, 512) - (batch size, sentence length, embedding dim)
GRU1 - no shape change - (32, 256, 512)
GRU2 - no shape change - (32, 256, 512)
Dense - shape change - (32, 256, 256) - (batch size, sentence length, vocab size)
Softmax - no shape change - (32, 256, 256)

So the model outputs (32, 256, 256). That is 32 various sentences with max 256 chars and 256 probabilities for the next char. You check which char of the 256 choices has max value - most probable char - the models prediction.

That was the RNN part. Now, the perplexity is a proxy number how well the model is doing. It is not directly dependent on model dimensions.

Cheers

P.S. I realized this was character prediction and corrected my dimensions.

Topic		Replies	Views
C3_W2_lecture_nb_3_perplexity - dimension of predictions NLP with Sequence Models week-2	1	568	November 30, 2022
#C3W1 Assignment - perplexity score NLP with Sequence Models week-1	6	348	April 29, 2024
C3_W1 - understanding Calculating perplexity lab NLP with Sequence Models week-1	1	26	January 7, 2025
W1 A1 dimensions of n_a and n_y? Sequence Models week-1 , coursera-platform	2	8	December 13, 2024
C3W1 Log perplexity, All tests passed in unit test, but getting 0 during grading NLP with Sequence Models week-1	4	54	October 5, 2024

RNN dimensions for hidden state and output

Related topics