C2W3 Tensorflow Intro: 3.3 Train the Model

The exercise explains that Softmax will be applied internally by calculating tf.keras.losses.categorical_crossentropy().
So, forward_propagation() only calculates up to Z3, which is before applying an activation function in the third layer.

However, when evaluating the Test Set every 10 epochs, in the following block, we’re comparing the ground truths with Z3, which hasn’t been through the Softmax activation, which would indicate which label a sample belongs to.
So what are we actually comparing? I don’t quite follow.

for (minibatch_X, minibatch_Y) in test_minibatches:
    Z3 = forward_propagation(tf.transpose(minibatch_X), parameters)
    test_accuracy.update_state(minibatch_Y, tf.transpose(Z3))
print("Test_accuracy:", test_accuracy.result())

I’m sorry if this has been asked before, and thanks in advance for your time.

Hello @jorgeencinas,

Welcome here!!!

To understand it, we trace back to the test_accuracy, which trace back to how we define it which is tf.keras.metrics.CategoricalAccuracy, and that traces back to the Tensorflow’s doc, which said:

You can provide logits of classes as y_pred , since argmax of logits and probabilities are same.

Because “logits ---- softmax ----> probability”, softmax doesn’t matter here. Think about it further :wink:

Cheers,
Raymond

PS: It’s a healthy habit to read the doc everytime we see a TF thing, :wink:

Right! Just to provide one more level of explanation: the key point is that the function softmax is monotonic and increasing. The output values are in the same order as the input values in magnitude, which is why the statement about “argmax” is true.

Thank you very much for welcoming me into the community! And of course, for your responses. They were both very precise, and helped guide me in the right direction. I did a little bit more reading to really understand them, and now I get it!

Will definitely follow your advice about checking on TF’s documentation carefully from here on, thank you for your patience :smile: