TensorFlow use of Z3 instead of A3

paulinpaloalto · May 6, 2022, 3:07pm

It’s an interesting point! Have a look at the documentation for the different types of the “cross entropy” loss functions in TF and read the description of the from_logits parameter. Here’s the docpage for binary cross entropy and here’s the categorical case for multiclass networks. What you find is that it is an option that the TF loss functions give you to input either the linear activation outputs (Z3 in this example) or the full outputs after the activation function has been applied (A3). The reason that they give this option is that it turns out to be both more efficient (less code) and more numerically stable to implement the activation function and the loss calculation together. One obvious example of what is better about that is that you can handle the case of “saturated” sigmoid or softmax output values. In floating point, the output of sigmoid can round to exactly 0 or 1, even though it never really is 0 or 1 mathematically. If you look at the loss formula, you can see why that would be a problem: you end up with a NaN value for the cost in that case.

I forget whether Prof Ng ever explains this in the lectures anywhere, but what you will find is that we always use the “from_logits = True” mode in these courses. It’s less code for us to write and it works better, so what is not to like about that?

Topic		Replies	Views
C2 W3 Tensorflow assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	539	October 26, 2022
Why doesn't forward_propagation contain the activation values? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	500	February 3, 2023
Week 3 Programming Assignment : Ex6 Computing Cost Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	759	July 13, 2021
(Tensorflow assignment )Why not compute A3 in forward_prop function? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	745	May 11, 2021
Week 3 Excersice: compute_cost(logits, labels) function Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	635	August 12, 2022

TensorFlow use of Z3 instead of A3

Related topics