Weight updation check in ANN over several epochs

tennis_geek · December 22, 2022, 10:13am

Using numpy, Implementing ReLU in hidden & sigmoid in output layer respectively… with calculation of cost function and updation of weights and bias per epoch.
From resultant dj/dw at say the end of each epoch., I will calculate :
Wnew = Wprevious - (learning rate*dj/dw)
for all the weights in each layer of my ANN model.
Will help my understanding of weight updation concept.
Please validate
Thanks

rmwkwok · December 22, 2022, 11:11am

They sound OK to me!

Raymond

subagopa · December 23, 2022, 2:13pm

Hello @tennis_geek ! Su here!
Yep I’m with @rmwkwok on this stance, it looks alright to me!

tennis_geek · December 23, 2022, 4:08pm

Thanks @rmwkwok @subagopa
Finished coding using numpy, now in debugging process

For every epoch…
For in any given non-input layer say ‘X’…
Dimensions (shape in python) of weights (Wnew or Wold as pointed in below equation) shall be ‘dimw’ → (numberofnodesinX-1layer) X (numberofnodesinXlayer) .

Then in the formula to update weight for that layer X (at which I am getting a shape broadcast error)
Wnew = Wold - (learningrate*dj/dw)

My questions:

So from this weight updation, dimensions of dj/dw shall be same as dimw or not?
But dj/dw after each epoch is calculated as dj/dw[eachcol]+error[eachrow]*inputdata[eachrow,eachcol] which Intuitively makes dj_dw a vector?
3.dj/db will always be a scalar for any non-input layer?
Thanks

rmwkwok · December 24, 2022, 1:43am

Hello @tennis_geek,

The shape of \frac{\partial{J}}{\partial{whatever}} has to be the same as whatever, because we have one derivative value for each element in whatever.

So,

Q1. Yes
Q3. As long as b is a scalar, but for a layer of n neurons, b is a vector of n elements, so b isn’t always a scalar, but it can be a vector when n>1.

Q2. For linear regression (the case of 1 neuron), \frac{\partial{J}}{\partial{w_j}} = \sum_i^m error_i \times x_{ij}, and in general, \frac{\partial{J}}{\partial{w}} = \sum_i^m error_i \times x_{i} (Note that j is removed from subscripts)

Now, w is a vector meaning the weights, error_i is a scalar meaning the error of the sample i, and x_i is a vector meaning the feature vector of the sample i.

Therefore, for the case of 1 neuron, w is a vector, and \frac{\partial{J}}{\partial{w}} is a vector.

By induction, for the case of k neurons, w is a matrix (a stack of k vectors), and \frac{\partial{J}}{\partial{w}} is a matrix.

Cheers,
Raymond

tennis_geek · December 24, 2022, 2:14am

Great & succinct summary!
The whole coding from scratch sometimes makes the head spin, but definitely worth pursuing as with an ANN self built could go a long way!
Will update once I debug this part of the code.
Thanks again

rmwkwok · December 24, 2022, 2:17am

I agree with you! Keeping track of the shapes is a life-time exercise. We all do that as long as we customize something in a NN.

Good luck!

Raymond

Topic		Replies	Views
Parameters in Neural Network Calculus for Machine Learning and Data Science week-3	2	63	September 29, 2024
Course1 week 4 last assignment Neural Networks and Deep Learning	2	571	March 15, 2025
week-4-Backpropagation Neural Networks and Deep Learning week-4	8	26	November 16, 2024
The dimensions of dW Neural Networks and Deep Learning week-3	4	33	February 6, 2025
Exercise 7 - how to update_parameters W1 Neural Networks and Deep Learning	11	565	May 18, 2022

Weight updation check in ANN over several epochs

Related topics