How does weights get updated?

in linear regression, after every iteration(using all the training examples), weights get updated.
how do weights get updated in neural networks? if m= 50, does weights get updated

optA: after 1 epoch i.e sending all 50 examples through the neural network once? or
optB: after each training example is passed?

if it’s optA, how do we calculate the error ?sum of all errors?do we have 50 forward prop and then one backward prop?
if its optB, does that mean in 1 epoch, weights get updated 50 times?

Hello @karra1729,

In the course, we do batch gradient descent, which is option A.

There is another approach called the stochastic gradient descent which is option B.

If you question is, which approach does the course introduce? The answer is option A.

In batch gradient descent, we average all errors. if you implement forward propogation using for-loop, there will be 50 forward propogations, once for each sample. If you implement it with vectorization, there will be only one forward propogation. After all samples are forward propogated, errors are averaged, and then we use it to update the weights once, and so there will be one backward propogation.

This is for stochastic gradient descent, and not for batch gradient descent.


sir, when we average the we take abs value of error? if not, the error can theoretically avg to zero. suppose my batch size is 4 for a regression problem. e1=1 e2=2 e3=-1 e4=-2.this will average out to zero.

Hello @karra1729,

We don’t take absolute value (You may go back to the videos for the formula), and yes, it can become zero, and it is not necessarily bad.
Consider you have 4 data points distributing at the 4 corners of a square, and you are fitting the 4 data points with a straight line, what would the best line be? And how would the errors be like?


sir,I am totally confused. is the error that gets backpropagated same as loss function? and also what does Yhat-Y(original) refers to. i saw a - y somewhere(maybe in cnns or rnns I don’t remeber). I’m totally confused.

edit: I just remembered where I saw it.thos was in old ML course.this error is just an intuition right? This is not the cost function,right?. And if we were to add this error while using batch gradient descent. Do we just take sum or abs sum?

Sorry if my question seems confusing.i myself am super confused. There’s a lot of things simultaneously clashing in my brain


If you would like to know the meaning of symbols from the the old ML courses, or any other courses, I suggest you to go through the courses’ videos again, and write the definitions down.

For example, in the new MLS, in course 1, f_{(\vec{w},b)}(\vec{x}) is used for the predicted value of linear/logistic regression model for sample \vec{x}, and y for the true y value for sample \vec{x}.

In course 2 week 1, when we introduce neural network, we use \vec{a} for a layer’s activation output, and \hat{y} as the prediction.

Please do the same for other courses, and things will become clearer.

Please be reminded that all of my previous answers are based on the syllabus of the new MLS.

1 Like