Could you please tell - what mathematical method is used to estimate the parameters (w_k,b_k), k=1,…,K of SOFTMAX multiclass classificator? It was very clearly explained how it is done fo binary logistic regression by gradient descent by it remained unclear for me how we do it in case of multiclass classification.

Will be grateful for the answer,
My best regards,
Vasyl.

Thank you for the answer.
Could you please clarify a little bit… If I have K classes, I have to run K times the estimation procedure to estimate (w_k,b_k), k=1,…, K? Aren’t these values interdependent?

Essentially there is a ‘w’ vector for each output. They’re grouped into a matrix W (as either the rows or columns, it depends on the model implementation) for mathematical convenience.

It’s like K-dimensional logistic regression. But if you implement it as a neural network, you also have hidden layers that can create a model with more complexity.