How does weights get updated?

karra1729 · July 25, 2022, 7:18am

in linear regression, after every iteration(using all the training examples), weights get updated.
how do weights get updated in neural networks? if m= 50, does weights get updated

optA: after 1 epoch i.e sending all 50 examples through the neural network once? or
optB: after each training example is passed?

if it’s optA, how do we calculate the error ?sum of all errors?do we have 50 forward prop and then one backward prop?
if its optB, does that mean in 1 epoch, weights get updated 50 times?

rmwkwok · July 25, 2022, 7:43am

Hello @karra1729,

In the course, we do batch gradient descent, which is option A.

There is another approach called the stochastic gradient descent which is option B.

If you question is, which approach does the course introduce? The answer is option A.

In batch gradient descent, we average all errors. if you implement forward propogation using for-loop, there will be 50 forward propogations, once for each sample. If you implement it with vectorization, there will be only one forward propogation. After all samples are forward propogated, errors are averaged, and then we use it to update the weights once, and so there will be one backward propogation.

This is for stochastic gradient descent, and not for batch gradient descent.

Cheers,
Raymond

karra1729 · July 26, 2022, 5:52am

sir, when we average the errors.do we take abs value of error? if not, the error can theoretically avg to zero. suppose my batch size is 4 for a regression problem. e1=1 e2=2 e3=-1 e4=-2.this will average out to zero.

rmwkwok · July 26, 2022, 5:56am

Hello @karra1729,

We don’t take absolute value (You may go back to the videos for the formula), and yes, it can become zero, and it is not necessarily bad.
Consider you have 4 data points distributing at the 4 corners of a square, and you are fitting the 4 data points with a straight line, what would the best line be? And how would the errors be like?

Cheers,
Raymond

karra1729 · July 26, 2022, 6:39am

sir,I am totally confused. is the error that gets backpropagated same as loss function? and also what does Yhat-Y(original) refers to. i saw a - y somewhere(maybe in cnns or rnns I don’t remeber). I’m totally confused.

edit: I just remembered where I saw it.thos was in old ML course.this error is just an intuition right? This is not the cost function,right?. And if we were to add this error while using batch gradient descent. Do we just take sum or abs sum?

Sorry if my question seems confusing.i myself am super confused. There’s a lot of things simultaneously clashing in my brain

rmwkwok · July 26, 2022, 7:27am

@karra1729,

If you would like to know the meaning of symbols from the the old ML courses, or any other courses, I suggest you to go through the courses’ videos again, and write the definitions down.

For example, in the new MLS, in course 1, f_{(\vec{w},b)}(\vec{x}) is used for the predicted value of linear/logistic regression model for sample \vec{x}, and y for the true y value for sample \vec{x}.

In course 2 week 1, when we introduce neural network, we use \vec{a} for a layer’s activation output, and \hat{y} as the prediction.

Please do the same for other courses, and things will become clearer.

Please be reminded that all of my previous answers are based on the syllabus of the new MLS.

Topic		Replies	Views
[Week 1] How are the weights updated in backpropagation thorough time? Sequence Models coursera-platform	12	916	July 15, 2023
Stochastic Gradient Descent (why do we update all weights) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	540	April 24, 2022
Updating parameters for m examples gradient descent Neural Networks and Deep Learning week-module-1 , coursera-platform	3	23	January 3, 2025
Gradient descent in NN: Order of updating weights in different layers Advanced Learning Algorithms week-module-2	16	914	August 31, 2023
Weights in each layer Advanced Learning Algorithms	4	298	January 9, 2024

How does weights get updated?

Related topics