I have learned backpropagation. Suppose I want to solve a simple problem using softmax and I consider a small dataset of 3 data points with 3 classes a,b,c and initialize weights w1,w2,w3 and bias b1,b2.b3. after doing the calculation of all loss functions of each data points i simply calculate the cost function of the whole data. Now I want to calculate gradient descent to update w1. So what steps do I have to take to find a new w1 using backpropagation?

Note: I am not using neural network architecture

Sorry, but what do you mean you are not using NN architecture?

yes i am asking a general question about calculating derivative of cost function with respect to w1

I’ll say you define your cost function and your learning rate.

Then you calculate the partial derivative of the cost function with respect to w1.

Finally you subtract:

w1 - learning rate * partial derivative of cost function with respect to w1.

Remember that you are to update the weights concurrently, and this means you update w3 and w3 too at the same time.

And you do until it converges and you have your best weights that fit your data.

If you want to see how back propagation works, check this out