C2_W3_multiclassification

abdou_brk · September 5, 2022, 12:09pm

hello everyone
there is point i don’t understand about multiclassification with softmax.the point is that we know mathematicaly that the computation steps of softmax we end up assigning the higher prob to the higher z so why we don’t simply assign 1 to the higher value of z(L) e.g applying hard max instead of doing all the calculations of the softmax function??

paulinpaloalto · September 5, 2022, 2:48pm

Yes, it is a good point that softmax is a monotonic function, so the maximum input will produce the maximum output. But what will you use as your loss function if you eliminate the softmax activation? There are a number of advantages that come from converting the predictions of the network into something that looks like a probability distribution. One big such advantage is that you have the cross entropy loss function as the ideal vehicle to drive the training.

abdou_brk · September 5, 2022, 5:32pm

can’t we use something like relu or svm cost function?

paulinpaloalto · September 5, 2022, 6:14pm

What is the “relu cost function”? You can try other cost functions, but for classification problems everyone uses cross entropy loss. There are very similar formulations for the binary and multiclass classification cases. Of course the standard is to use sigmoid as the activation in the binary case and softmax in the multiclass case. Both of those are exactly paired with cross entropy from a mathematical properties perspective. It’s not an accident that they use those pairings …

Of course this is an experimental science. If you think you have a better idea for a different function to use or perhaps just want to understand why people don’t use, say, MSE as the cost function for classifications, you are welcome to run the experiments. Try your alternative method and see what happens. If you find something that works better, publish the paper and tell the world your new discovery!

Topic		Replies	Views
C2 W3 Tensorflow assignment Improving Deep Neural Networks: Hyperparameter tun	3	538	October 26, 2022
Why softmax in last layer for multiclass NN? Improving Deep Neural Networks: Hyperparameter tun	5	563	January 7, 2022
Why ReLU and softmax? NLP with Probabilistic Models week-4	1	603	November 2, 2021
Softmax Loss function Improving Deep Neural Networks: Hyperparameter tun	6	608	May 7, 2021
(Tensorflow assignment )Why not compute A3 in forward_prop function? Improving Deep Neural Networks: Hyperparameter tun	4	740	May 11, 2021

C2_W3_multiclassification

Related topics