Weight updation check in ANN over several epochs

Using numpy, Implementing ReLU in hidden & sigmoid in output layer respectively… with calculation of cost function and updation of weights and bias per epoch.
From resultant dj/dw at say the end of each epoch., I will calculate :
Wnew = Wprevious - (learning rate*dj/dw)
for all the weights in each layer of my ANN model.
Will help my understanding of weight updation concept.
Please validate
Thanks

They sound OK to me!

Raymond

1 Like

Hello @tennis_geek ! Su here! :smiley:
Yep I’m with @rmwkwok on this stance, it looks alright to me!

Thanks @rmwkwok @subagopa
Finished coding using numpy, now in debugging process

For every epoch…
For in any given non-input layer say ‘X’…
Dimensions (shape in python) of weights (Wnew or Wold as pointed in below equation) shall be ‘dimw’ → (numberofnodesinX-1layer) X (numberofnodesinXlayer) .

Then in the formula to update weight for that layer X (at which I am getting a shape broadcast error)
Wnew = Wold - (learningrate*dj/dw)

My questions:

  1. So from this weight updation, dimensions of dj/dw shall be same as dimw or not?
  2. But dj/dw after each epoch is calculated as dj/dw[eachcol]+error[eachrow]*inputdata[eachrow,eachcol] which Intuitively makes dj_dw a vector?
    3.dj/db will always be a scalar for any non-input layer?
    Thanks

Hello @tennis_geek,

The shape of \frac{\partial{J}}{\partial{whatever}} has to be the same as whatever, because we have one derivative value for each element in whatever.

So,

Q1. Yes
Q3. As long as b is a scalar, but for a layer of n neurons, b is a vector of n elements, so b isn’t always a scalar, but it can be a vector when n>1.

Q2. For linear regression (the case of 1 neuron), \frac{\partial{J}}{\partial{w_j}} = \sum_i^m error_i \times x_{ij}, and in general, \frac{\partial{J}}{\partial{w}} = \sum_i^m error_i \times x_{i} (Note that j is removed from subscripts)

Now, w is a vector meaning the weights, error_i is a scalar meaning the error of the sample i, and x_i is a vector meaning the feature vector of the sample i.

Therefore, for the case of 1 neuron, w is a vector, and \frac{\partial{J}}{\partial{w}} is a vector.

By induction, for the case of k neurons, w is a matrix (a stack of k vectors), and \frac{\partial{J}}{\partial{w}} is a matrix.

Cheers,
Raymond

Great & succinct summary!
The whole coding from scratch sometimes makes the head spin, but definitely worth pursuing as with an ANN self built could go a long way!
Will update once I debug this part of the code.
Thanks again

I agree with you! Keeping track of the shapes is a life-time exercise. We all do that as long as we customize something in a NN.

Good luck!

Raymond