(Tensorflow assignment )Why not compute A3 in forward_prop function?

nikolafuse · April 25, 2021, 11:09am

Why we just compute A1 and A2 with relu function and return fuction with Z3 and compute its cost. Why not A3 with softmax classifier since we are predict multi-class labels. Thanks in advance.

paulinpaloalto · April 25, 2021, 5:32pm

That’s because they want you to use the Binary_crossentropy loss function with the from_logits = True argument, which causes the sigmoid calculation to be incorporated in the loss computation. That is preferred because they can manage numerical stability better when they do the two together. E.g. dealing with problems with “saturated” sigmoid output values. Here’s the doc page for Binary_crossentropy.

They don’t really explain what the from_logits = True argument does in the assignment, but they do literally write out the code for you in the instructions. Maybe it would help if they explained it a bit more. I’ll suggest that.

paulinpaloalto · April 26, 2021, 10:02pm

Although now that you mention it, this is a multi-class problem, so maybe the loss function should actually be categorical_crossentropy in which case it would be the softmax calculation instead of sigmoid that is being done internally. I can’t tell from the documentation if perhaps the binary version is smart enough to handle either case. Of course you can think of softmax as the multi-class generalization of sigmoid and the math is very similar. I will investigate further.

nramon · April 29, 2021, 5:56pm

I was wondering about this too, @paulinpaloalto. categorical_crossentropy seemed like the obvious choice. binary_crossentropy works, but it would’ve made more sense had this been a multi-label classification problem.

I guess what’s important is that @nikolafuse’s intuition is correct and the reason why the last activation (be it softmax or sigmoid) doesn’t have to be explicitly computed was clearly explained.

paulinpaloalto · May 11, 2021, 4:32pm

Update on this issue: in the notebook, they don’t give you any logic to assess the results of the training, but I went ahead and cooked up my own version of that. It turns out that the training works fine and you get good prediction accuracy on both the train and test sets, but you really need to use Adam optimization (as the say in the instructions) as opposed to the SGD optimizer that they actually gave us in the code as written. I conclude from this that the binary_crossentropy logic in TF is smart enough to handle the fact that this is really not a binary classification and everything just works as expected.

Topic		Replies	Views
C2 W3 Tensorflow assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	538	October 26, 2022
Week 3 - Assignment - compute_total_loss - try to set from_logits=False Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	16091	July 23, 2023
Why doesn't forward_propagation contain the activation values? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	500	February 3, 2023
Course 2, week 3 programming assignment - Compute cost function Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	1219	February 20, 2023
Cannot compute_cost course 2 week 3 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	91	6736	January 7, 2023

(Tensorflow assignment )Why not compute A3 in forward_prop function?

Related topics