Hello,
I was trying to train the model given in the assignment using both softmax and sigmoid activation functions.
The sigmoid function was giving better accuracy compared to softmax.
I had read in the previous course that softmax is ideal for multi-class and sigmoid for multi-label problems.
Since the assignment dealt with multi-class problem I would’ve expected softmax to perform better.
Why is this happening?
What am I missing?
Thanks.