I’m a bit surprised to see that for sofmax regression some authors don’t add bias (parameters b_1, …, b_N).
Here is an example: Unsupervised Feature Learning and Deep Learning Tutorial
And what are Tenserflow/Keras doing? Bias: yes or no?
Thank you in advance for your answers.
Pierre BEJIAN (France)
Sometimes people move the bias term into the \theta vector as a weight, only that the corresponding feature values are always 1 for all samples. The author of that example article could have done that, because the author didn’t explicitly say the bias terms are excluded.
Tensorflow lets you choose to use bias or not use bias in the Dense layer followed by a softmax activation.
I believe that I have understood. Are these calculations of the following form?
Yes! And you are welcome!