I have a question regarding the last graded exercise in C3W1 (GRU models)
There, we are supposed to compute the perplexity given a batch of predictions and a batch of targets.

My code was failing at the last unittest, and when I investigated the cause, I found that the failing case had a different shape of preds than expected.
Precisely - I’d expect the input tensor of shape (batch_size, seq_length, vocab_size), but the problematic preds are 4-dimensional. See shape comparisons of all test examples:
(1, 5, 3)
(1, 5, 3)
(1, 5, 3)
(1, 5, 3)
(1, 8, 5)
(1, 8, 5)
(1, 8, 5)
(1, 7, 3)
(2, 1, 7, 3)

I figured if I removed the 2nd dimension, the shape would match the expected shape (w.r.t. to targets), but my log-perplexity is slightly off.

Am I doing something wrong, or the test inputs are incorrect?
Thanks

The only test I failed in the Assignment, everything else passed.
And for the context, I treated the suspected prediction with tf.squeeze to remove the extra dimension

You can check your intermediate values by printing them and comparing with these. That way you could catch where your solution deviates from the intended one.

also notice for pred.shape instructions provided as hint were

To convert the target into the same dimension as the predictions tensor use tf.one_hot with target and preds.shape[-1].

You will also need the np.equal function in order to unpad the data and properly compute perplexity.

Also refer this to assign the corrective axis value for the perplexity score.

If the input indices is rank N , the output will have rank N+1 . The new axis is created at dimension axis (default: the new axis is appended at the end). So choosing 1 is incorrect in this scenario.

Thank you for answering. My problem is 1:1 to your post (I used the same solution as Cawnpore_Charlie and arrived at the same perplexity value; see the screen above).
So, is there a way to pass the last unittest (just out of curiosity, I’ve already submitted my assignment) as I don’t seem to get that from your post.

As I already mentioned in my previous comment, your code is incorrect exactly where I suspected.

You are using incorrect pred.shape, also not using tf.one_hot label.
The instruction in the exercise.clearly mentions
Calculate log probabilities for predictions using one-hot label

Next in the non_pad, kindly use padding_id value as 1, instead of using padding_id

Next in the sum of probabilities your axis choices are incorrect.

Go back to the instruction section, go point by point, you will find the solution.

Also please make sure not to edit or add any codes outside of ##Start and end code here.