Optimization algorithm on neural network's parameters

benmore · June 13, 2024, 7:51am

When running an optimization algorithm (gradient descent or the Adam algorithm) on a neural network, does every single parameter in the neural network get updated? If so, does the update of a parameter in a hidden layer affect the update of parameters in succeeding layers? If so, how?

If I understand correctly, you could technically express a given parameter in a neural network in terms of parameters that come before it because those preceding parameters are used to calculate the activation values that are fed into the hidden layer containing the given parameter.

E.g., Suppose a neural network has 2 dense layers, each containing 1 unit defined by a linear activation function, and all parameters are scalar for simplicity. Then, a^{[1]} = w^{[1]}x+b^{[1]} and a^{[2]} = w^{[2]}a^{[1]}+b^{[2]} \Rightarrow w^{[2]} = \frac{a^{[2]} - b^{[2]}}{a^{[1]}}, so w^{[2]} = \frac{a^{[2]} - b^{[2]}}{w^{[1]}x+b^{[1]}}. Thus, a change in w^{[1]} changes w^{[2]}.

Considering that the gradient descent algorithm does simultaneous updates for all parameters, how does the interdependence among parameters argued above affect the optimization algorithm?

rmwkwok · June 13, 2024, 1:08pm

Hello, @benmore,

All parameters are updated at the same time, which means their updating order does not matter. The fact is, once forward propagation is done and the cost (the error) calculated, we will immediately be able to compute how much each weight should get updated (without having to update any one of them first), based on the (1) errors of predictions and (2) all activations (a^{[l]}) calculated during the forward prop (not during the update).

So, through the cached activations and the errors, all w^{[l]}'s with their values in the forward prop stage affect how each w^{[l]} gets updated in the back prop stage. However, the updated w^{[l]}'s do not affect anything in the same back prop stage, but they will have effect, through the next forward prop stage, in the next back prop stage.

Cheers,
Raymond

Simplified workflow.

PS: My above response is only trying to help you develop an impression of what is going on there, but I did not touch the heart of how optimization steps are like. Your equations are mathematically correct, but they are not how optimization is carried out.

Topic		Replies	Views
Gradient descent in NN: Order of updating weights in different layers Advanced Learning Algorithms week-2	16	776	August 31, 2023
Week 3 Exercise 7 Neural Networks and Deep Learning	2	503	August 25, 2022
Gradient Descent Doubt AI Discussions	7	114	July 11, 2022
Updating parameters for m examples gradient descent Neural Networks and Deep Learning week-1	3	19	January 3, 2025
Week2 - Derivation for Update function for w(i+1) Neural Networks and Deep Learning week-2	8	225	January 21, 2024

Optimization algorithm on neural network's parameters

Related topics