How to calculate parameters (w,b) for Softmax?

Could you please tell - what mathematical method is used to estimate the parameters (w_k,b_k), k=1,…,K of SOFTMAX multiclass classificator? It was very clearly explained how it is done fo binary logistic regression by gradient descent by it remained unclear for me how we do it in case of multiclass classification.

Will be grateful for the answer,
My best regards,

It’s the same method, it’s used for each of the output classes.
Then softmax is used to re-scale the outputs so they sum to 1.

Thank you for the answer.
Could you please clarify a little bit… If I have K classes, I have to run K times the estimation procedure to estimate (w_k,b_k), k=1,…, K? Aren’t these values interdependent?

Essentially there is a ‘w’ vector for each output. They’re grouped into a matrix W (as either the rows or columns, it depends on the model implementation) for mathematical convenience.

It’s like K-dimensional logistic regression. But if you implement it as a neural network, you also have hidden layers that can create a model with more complexity.

Thank you!
Could you please give a reference on how this K-dimensional logistic regression looks from mathematical point of view?

Sorry, I don’t have one.
I think its covered later in the course.