C3W1_Assignment: log_perplexity - Expected nan

troncosomath · April 25, 2024, 5:27pm

I Working on the assigment for Week 1 on log_perplexity. However, one of the unit test is failing because it is expecting a nan value instead of 0. I am very confuse with this specially because we are just taking sums since we never really take logs, I do not understand why we would get a nan value.

The unit test that is failing is

{
    "name": "example 2",
    "input": {
        "preds": tf.constant([[[0.0, 0.0, 1.],
                                       [0.0, 0.0, 1.],
                                       [0.0, 0.0, 1.],
                                       [0.0, 0.0, 1.],
                                       [0.0, 0.0, 1.]]]),
                "target": tf.constant([[1, 1, 1, 1, 1]]),
            },
            "expected": float("nan"),
        }

Could I get some help on this one please? I also follow C3W1 ungraded lab to make sure I was doing everything right but I can’t spot any difference.

Lab ID sirozhllqgyp

Alireza_Saei · April 25, 2024, 6:18pm

Hi @troncosomath

I don’t have access to the assignment but the issue is related to taking the logarithm of zero probabilities, which results in NaN. When computing the log perplexity, if the predicted probabilities for certain events are zero, taking the logarithm of zero will result in NaN.

troncosomath · April 25, 2024, 6:35pm

@Alireza_Saei
That would totally make sense for me. However, in this function we are just working with np.sum instead of Log, we never actually take the log on the assignment neither on the practice notebook.

I am looking for the resource online but I do not find to share but you can check out the ungraded notebook as well where we compute the perplexity and we never use log, rather we use a simplify version using the sum
The notebook is “Calculating perplexity using numpy: Ungraded Lecture Notebook”.

I am still wondering if there is something wrong with my solution or the unit test aren’t correct.

troncosomath · April 25, 2024, 6:43pm

Here is what we have to complete. I am not allow to put my solution. Hence I am just putting the empty function to be filled to not violated the rules

    # Calculate log probabilities for predictions using one-hot encoding
    
    log_p = np.sum(None * None), axis= -1) # HINT: tf.one_hot() should replace one of the Nones
    # Identify non-padding elements in the target
    non_pad = 1.0 - np.equal(None, 0)          # You should check if the target equals to PADDING_ID
    # Apply non-padding mask to log probabilities to exclude padding
    log_p = None* None                             # Get rid of the padding
    # Calculate the log perplexity by taking the sum of log probabilities and dividing by the sum of non-padding elements
    log_ppx = np.sum(None, axis=None) / np.sum(None, axis=None) # Remember to set the axis properly when summing up
    # Compute the mean of log perplexity
    log_ppx = np.mean(None) # Compute the mean of the previous expression
    ### END CODE HERE ###
    return -log_ppx

@Alireza_Saei notice how we do not use log in this case. If I am honest I would have expect a log somewhere, because when we pass the log inside the Product. Product → Sum but the log should be there but it doesn’t seem to be expect on the solution neither is present on the ungraded notebook for perplexity

Deepti_Prasad · April 25, 2024, 11:18pm

in this step as there is an error, check if the targets equals to padding_ID, So choosing 0 is incorrect as the padding_ID is 1.

Based on your output, I would check if prediction shape is correctly choosen with axis.

I hope you refer the hint given before the grader cell

To convert the target into the same dimension as the predictions tensor use tf.one_hot with target and preds.shape[-1].
You will also need the np.equal function in order to unpad the data and properly compute perplexity.

If the above instructions have been followed as mentioned then I would go back to the previous grader cell or check if there is no issue with the dataset codes.

Regards
DP

troncosomath · April 25, 2024, 11:41pm

@Deepti_Prasad
That works thanks for the help. Just to fully understand the float(“nan”) case.
Just to verify my understand.
Is the reason float(“nan”) is because we are dividing by zero, correct?

paulinpaloalto · April 26, 2024, 12:00am

Dividing by zero gets you Inf or -Inf, not NaN.

Here’s a chunk of sample code:

v = 42. * np.ones((1,4), dtype = 'float64')
print(f"type(v) = {type(v)}")
print(f"v = {v}")
w = np.zeros((1,4), dtype = 'float64')
print(f"type(w) = {type(w)}")
print(f"w = {w}")
z = v / w
print(f"z = {z}")
a = -1. * z
print(f"a = {a}")
b = z + 42.
print(f"b = {b}")
c = z - 42.
print(f"c = {c}")
d = z + z
print(f"d = {d}")
e = z - z
print(f"e = {e}")
f = z / z
print(f"f = {f}")

Running that gives this result:

type(v) = <class 'numpy.ndarray'>
v = [[42. 42. 42. 42.]]
type(w) = <class 'numpy.ndarray'>
w = [[0. 0. 0. 0.]]
z = [[inf inf inf inf]]
a = [[-inf -inf -inf -inf]]
b = [[inf inf inf inf]]
c = [[inf inf inf inf]]
d = [[inf inf inf inf]]
e = [[nan nan nan nan]]
f = [[nan nan nan nan]]

troncosomath · April 26, 2024, 1:41pm

Thank you so much. That completely clarify for me when we are getting nan. My analysis was to shallow clearly non-zero number divide by 0 will be infinity and 0/0 or infinity/infinity or infinity - infinity will yield nan. So every undefined or ambiguous operation.

@paulinpaloalto thank for the clear and prompt response

Deepti_Prasad · April 26, 2024, 2:56pm

That’s great!!! Also thank you following community guidelines and not sharing any grader cell codes.

as mentioned by Paul already dividing by zero gets you infinity or -infinity.

NaN represents missing or undefined data in Python. It is typically encountered while performing mathematical operations that result in an undefined or nonsensical value. NaN is a floating-point value represented by the float(‘nan’) object in Python.

Regards
DP

paulinpaloalto · April 27, 2024, 5:45pm

Actually I just realized that my previous example code did not cover the logarithm case. I added two more tests to show what happens there:

g = np.log(w)
print(f"g = {g}")
h = np.log(-1 * v)
print(f"h = {h}")

Running that gives this:

g = [[-inf -inf -inf -inf]]
h = [[nan nan nan nan]]

The input values w and v are the same as above, so you can see that np.log(0) gives you -Inf and np.log of a negative number gives you NaN. Note that in mathematics, the log of a negative number actually does exist, but it is a complex number, meaning the “imaginary part” of the number is non-zero. That is the coefficient of i = \sqrt{-1}. Note that numpy does handle complex numbers, but the definition of the np.log API is that if the input is real-valued, then the output will also be real-valued. So in the case of a negative input, they will return NaN instead of a complex answer.

Topic		Replies	Views
C3W1 Log perplexity, All tests passed in unit test, but getting 0 during grading NLP with Sequence Models week-1	4	53	October 5, 2024
Question about C3W1 assignment Exercise 5 - log_perplexity NLP with Sequence Models week-1	1	445	December 25, 2023
C3_W2 Assignment test_model function floating point precision NLP with Classification and Vector Spaces week-4	4	530	January 5, 2023
C3W1 log_perplexity grading NLP with Sequence Models week-1	3	408	March 1, 2024
C2_W3_Assignment - UNQ_C10 - calculate_perplexity() NLP with Probabilistic Models week-3	12	669	September 28, 2023

C3W1_Assignment: log_perplexity - Expected nan

Related topics