Stochastic Gradient Descent (why do we update all weights)

xmkhan · October 18, 2021, 6:05am

From the Week #2 assignment, we learned about step 3 for SGD

Over the layers (to update all parameters, from (W^{[1]},b^{[1]}) to (W^{[L]},b^{[L]}))

I’m trying to understand why exactly are we updating all the weight, biases for each layer. It doesn’t seem intuitive to me because we would normally just care about W1, b1 (ie. just like batch GD)? My only guess is so that we can introduce some kind of optimization (weighted avg, “memory”) at each unit/layer in order to introduce stability across training.

What am i missing?

balaji.ambresh · April 24, 2022, 11:18am

Updating weights & biases is required for the NN to predict better.

When the weights get updated gives rise to 3 flavors:

Batch gradient descent updates weighs after processing all training examples.
Mini batch gradient descent updates weights after processing mini_batch_size number of examples of training data.
Stochastic gradient descent updates weights after each training example.

Topic		Replies	Views
Only update 1st layer weights Neural Networks and Deep Learning	3	499	January 16, 2023
Gradient Descent Query Advanced Learning Algorithms week-3	1	347	September 16, 2023
How does weights get updated? Advanced Learning Algorithms week-3	5	609	July 26, 2022
Gradient descent in NN: Order of updating weights in different layers Advanced Learning Algorithms week-2	16	824	August 31, 2023
[Week 1] How are the weights updated in backpropagation thorough time? Sequence Models	12	875	July 15, 2023

Stochastic Gradient Descent (why do we update all weights)

Related topics