Using numpy, Implementing ReLU in hidden & sigmoid in output layer respectively… with calculation of cost function and updation of weights and bias per epoch.

From resultant dj/dw at say the end of each epoch., I will calculate :

Wnew = Wprevious - (learning rate*dj/dw)

for all the weights in each layer of my ANN model.

Will help my understanding of weight updation concept.

Please validate

Thanks

They sound OK to me!

Raymond

Thanks @rmwkwok @subagopa

Finished coding using numpy, now in debugging process

For every epoch…

For in any given non-input layer say ‘X’…

Dimensions (shape in python) of weights (Wnew or Wold as pointed in below equation) shall be ‘dimw’ → (numberofnodesinX-1layer) X (numberofnodesinXlayer) .

Then in the formula to update weight for that layer X (at which I am getting a shape broadcast error)

Wnew = Wold - (learningrate*dj/dw)

My questions:

- So from this weight updation, dimensions of dj/dw shall be same as dimw or not?
- But dj/dw after each epoch is calculated as dj/dw[eachcol]+error[eachrow]*inputdata[eachrow,eachcol] which Intuitively makes dj_dw a vector?

3.dj/db will always be a scalar for any non-input layer?

Thanks

Hello @tennis_geek,

The shape of \frac{\partial{J}}{\partial{whatever}} has to be the same as whatever, because we have one derivative value for each element in whatever.

So,

Q1. Yes

Q3. As long as b is a scalar, but for a layer of n neurons, b is a vector of n elements, so b isn’t always a scalar, but it can be a vector when n>1.

Q2. For linear regression (the case of 1 neuron), \frac{\partial{J}}{\partial{w_j}} = \sum_i^m error_i \times x_{ij}, and in general, \frac{\partial{J}}{\partial{w}} = \sum_i^m error_i \times x_{i} (Note that j is removed from subscripts)

Now, w is a vector meaning the weights, error_i is a scalar meaning the error of the sample i, and x_i is a vector meaning the feature vector of the sample i.

Therefore, for the case of 1 neuron, w is a vector, and \frac{\partial{J}}{\partial{w}} is a vector.

By induction, for the case of k neurons, w is a matrix (a stack of k vectors), and \frac{\partial{J}}{\partial{w}} is a matrix.

Cheers,

Raymond

Great & succinct summary!

The whole coding from scratch sometimes makes the head spin, but definitely worth pursuing as with an ANN self built could go a long way!

Will update once I debug this part of the code.

Thanks again

I agree with you! Keeping track of the shapes is a life-time exercise. We all do that as long as we customize something in a NN.

Good luck!

Raymond