I decided to post a more complete derivation if anyone need extra clarification in the future. By the way, @nramon thx for the solution it helps me a lot!
Derivative of softmax activation:
Case 1:
Case 2:
Conclusion on derivative of Softmax:
Finally we utilize the derivation of softmax activation on derivative of the loss function: