Optimization algorithm on neural network's parameters

When running an optimization algorithm (gradient descent or the Adam algorithm) on a neural network, does every single parameter in the neural network get updated? If so, does the update of a parameter in a hidden layer affect the update of parameters in succeeding layers? If so, how?

If I understand correctly, you could technically express a given parameter in a neural network in terms of parameters that come before it because those preceding parameters are used to calculate the activation values that are fed into the hidden layer containing the given parameter.

E.g., Suppose a neural network has 2 dense layers, each containing 1 unit defined by a linear activation function, and all parameters are scalar for simplicity. Then, a^{[1]} = w^{[1]}x+b^{[1]} and a^{[2]} = w^{[2]}a^{[1]}+b^{[2]} \Rightarrow w^{[2]} = \frac{a^{[2]} - b^{[2]}}{a^{[1]}}, so w^{[2]} = \frac{a^{[2]} - b^{[2]}}{w^{[1]}x+b^{[1]}}. Thus, a change in w^{[1]} changes w^{[2]}.

Considering that the gradient descent algorithm does simultaneous updates for all parameters, how does the interdependence among parameters argued above affect the optimization algorithm?

Hello, @benmore,

All parameters are updated at the same time, which means their updating order does not matter. The fact is, once forward propagation is done and the cost (the error) calculated, we will immediately be able to compute how much each weight should get updated (without having to update any one of them first), based on the (1) errors of predictions and (2) all activations (a^{[l]}) calculated during the forward prop (not during the update).

So, through the cached activations and the errors, all w^{[l]}'s with their values in the forward prop stage affect how each w^{[l]} gets updated in the back prop stage. However, the updated w^{[l]}'s do not affect anything in the same back prop stage, but they will have effect, through the next forward prop stage, in the next back prop stage.

Cheers,
Raymond


Simplified workflow.

PS: My above response is only trying to help you develop an impression of what is going on there, but I did not touch the heart of how optimization steps are like. Your equations are mathematically correct, but they are not how optimization is carried out.

2 Likes