Calculating gradient of softmax function

I decided to post a more complete derivation if anyone need extra clarification in the future. By the way, @nramon thx for the solution it helps me a lot!

Derivative of softmax activation:
Case 1:
image

Case 2:
image

Conclusion on derivative of Softmax:
image

Finally we utilize the derivation of softmax activation on derivative of the loss function:

8 Likes