C3W1_Assignment: log_perplexity - Expected nan

I Working on the assigment for Week 1 on log_perplexity. However, one of the unit test is failing because it is expecting a nan value instead of 0. I am very confuse with this specially because we are just taking sums since we never really take logs, I do not understand why we would get a nan value.

The unit test that is failing is

{
"name": "example 2",
"input": {
"preds": tf.constant([[[0.0, 0.0, 1.],
[0.0, 0.0, 1.],
[0.0, 0.0, 1.],
[0.0, 0.0, 1.],
[0.0, 0.0, 1.]]]),
"target": tf.constant([[1, 1, 1, 1, 1]]),
},
"expected": float("nan"),
}


Could I get some help on this one please? I also follow C3W1 ungraded lab to make sure I was doing everything right but I canāt spot any difference.

Lab ID sirozhllqgyp

I donāt have access to the assignment but the issue is related to taking the logarithm of zero probabilities, which results in NaN. When computing the log perplexity, if the predicted probabilities for certain events are zero, taking the logarithm of zero will result in NaN.

@Alireza_Saei
That would totally make sense for me. However, in this function we are just working with np.sum instead of Log, we never actually take the log on the assignment neither on the practice notebook.

I am looking for the resource online but I do not find to share but you can check out the ungraded notebook as well where we compute the perplexity and we never use log, rather we use a simplify version using the sum
The notebook is āCalculating perplexity using numpy: Ungraded Lecture Notebookā.

I am still wondering if there is something wrong with my solution or the unit test arenāt correct.

Here is what we have to complete. I am not allow to put my solution. Hence I am just putting the empty function to be filled to not violated the rules

    # Calculate log probabilities for predictions using one-hot encoding

log_p = np.sum(None * None), axis= -1) # HINT: tf.one_hot() should replace one of the Nones
# Identify non-padding elements in the target
non_pad = 1.0 - np.equal(None, 0)          # You should check if the target equals to PADDING_ID
# Apply non-padding mask to log probabilities to exclude padding
log_p = None* None                             # Get rid of the padding
# Calculate the log perplexity by taking the sum of log probabilities and dividing by the sum of non-padding elements
log_ppx = np.sum(None, axis=None) / np.sum(None, axis=None) # Remember to set the axis properly when summing up
# Compute the mean of log perplexity
log_ppx = np.mean(None) # Compute the mean of the previous expression
### END CODE HERE ###
return -log_ppx


@Alireza_Saei notice how we do not use log in this case. If I am honest I would have expect a log somewhere, because when we pass the log inside the Product. Product ā Sum but the log should be there but it doesnāt seem to be expect on the solution neither is present on the ungraded notebook for perplexity

in this step as there is an error, check if the targets equals to padding_ID, So choosing 0 is incorrect as the padding_ID is 1.

Based on your output, I would check if prediction shape is correctly choosen with axis.

I hope you refer the hint given before the grader cell

• To convert the target into the same dimension as the predictions tensor use tf.one_hot with target and preds.shape[-1].
• You will also need the np.equal function in order to unpad the data and properly compute perplexity.

If the above instructions have been followed as mentioned then I would go back to the previous grader cell or check if there is no issue with the dataset codes.

Regards
DP

That works thanks for the help. Just to fully understand the float(ānanā) case.
Just to verify my understand.
Is the reason float(ānanā) is because we are dividing by zero, correct?

Dividing by zero gets you Inf or -Inf, not NaN.

Hereās a chunk of sample code:

v = 42. * np.ones((1,4), dtype = 'float64')
print(f"type(v) = {type(v)}")
print(f"v = {v}")
w = np.zeros((1,4), dtype = 'float64')
print(f"type(w) = {type(w)}")
print(f"w = {w}")
z = v / w
print(f"z = {z}")
a = -1. * z
print(f"a = {a}")
b = z + 42.
print(f"b = {b}")
c = z - 42.
print(f"c = {c}")
d = z + z
print(f"d = {d}")
e = z - z
print(f"e = {e}")
f = z / z
print(f"f = {f}")


Running that gives this result:

type(v) = <class 'numpy.ndarray'>
v = [[42. 42. 42. 42.]]
type(w) = <class 'numpy.ndarray'>
w = [[0. 0. 0. 0.]]
z = [[inf inf inf inf]]
a = [[-inf -inf -inf -inf]]
b = [[inf inf inf inf]]
c = [[inf inf inf inf]]
d = [[inf inf inf inf]]
e = [[nan nan nan nan]]
f = [[nan nan nan nan]]

2 Likes

Thank you so much. That completely clarify for me when we are getting nan. My analysis was to shallow clearly non-zero number divide by 0 will be infinity and 0/0 or infinity/infinity or infinity - infinity will yield nan. So every undefined or ambiguous operation.

@paulinpaloalto thank for the clear and prompt response

Thatās great!!! Also thank you following community guidelines and not sharing any grader cell codes.

as mentioned by Paul already dividing by zero gets you infinity or -infinity.

NaN represents missing or undefined data in Python. It is typically encountered while performing mathematical operations that result in an undefined or nonsensical value. NaN is a floating-point value represented by the float(ānanā) object in Python.

Regards
DP

Actually I just realized that my previous example code did not cover the logarithm case. I added two more tests to show what happens there:

g = np.log(w)
print(f"g = {g}")
h = np.log(-1 * v)
print(f"h = {h}")


Running that gives this:

g = [[-inf -inf -inf -inf]]
h = [[nan nan nan nan]]


The input values w and v are the same as above, so you can see that np.log(0) gives you -Inf and np.log of a negative number gives you NaN. Note that in mathematics, the log of a negative number actually does exist, but it is a complex number, meaning the āimaginary partā of the number is non-zero. That is the coefficient of i = \sqrt{-1}. Note that numpy does handle complex numbers, but the definition of the np.log API is that if the input is real-valued, then the output will also be real-valued. So in the case of a negative input, they will return NaN instead of a complex answer.

2 Likes