Questions - compute_cost

Hi friends,

I am so lost on this question. Allow me to ask this step by step. Please directly answer yes or no. Thank you so much for your time!

Q1. in the picture above. I print out the input from the tester function. So, my understanding is, the shape here (6, 2) means, there are 6 (row) classes and 2 (col) samples. Yes or No ?

Q2. in the picture above, from the example of “tf.keras.metrics.categorical_crossentropy”. So, the y_true and y_pred are both (2,3), does this means: there 2 (row) samples, and 3 classes (col), Yes or No?

Q3, if Q1,Q2 are both Yes, I need to transpose the matrix in Q1 from (6, 2) into (2, 6), so that the function “tf.keras.metrics.categorical_crossentropy” could use, Yes or No ?

Q4, assuming I do can calucate the loss function, so to find out the cost, I still need to sum, then average by number of sample, Yes or No ?

Thanks again for your time and help.

Why are you creating a duplicate thread?

Could you please take a look at my question? Thank you so much!

Hi sunson29,

Q1: Yes. Axis 0 stands for the number of rows, and axis 1 for the number of columns.
Q2: Yes. Tf expects the input with shape (number of examples, num_classes)
Q3: Yes, since our convention was shapes of (number of examples, num_classes)
Q4: If you mean summing for the samples and then dividing by the number of samples, then theoretically, yes. However, dividing by a constant simply scales your cost function, it does not add new information. I think the exercise expects you to omit the division.

Bests,
Tamas

oh, thank you!
For Q4, I also had question there. I already passed the assignment, but I don’t see the division there… actually.

Paul is busy right now. could you take a look as well?

As Paul commented, we accumulate the cost for the whole epoch and we divide by the number of samples in model(), line 78.

epoch_cost /= m

For the test:

assert (np.abs(result - (0.50722074 + 1.1133534) / 2.0) < 1e-7)

I am pretty sure your calculations are correct, and I don’t know why it takes the sum of twice the costs, then divides by 2. Maybe it has some historical reasons throughout the development of the notebook?
Btw, this was a little confusing for me as well, since the notebook did not accept my solution due to rounding errors, when instead of using “from_logits=True” I calculated explicitly “tf.keras.activations.softmax(logits)”.