When running an optimization algorithm (gradient descent or the Adam algorithm) on a neural network, does every single parameter in the neural network get updated? If so, does the update of a parameter in a hidden layer affect the update of parameters in succeeding layers? If so, how?
If I understand correctly, you could technically express a given parameter in a neural network in terms of parameters that come before it because those preceding parameters are used to calculate the activation values that are fed into the hidden layer containing the given parameter.
E.g., Suppose a neural network has 2 dense layers, each containing 1 unit defined by a linear activation function, and all parameters are scalar for simplicity. Then, a^{[1]} = w^{[1]}x+b^{[1]} and a^{[2]} = w^{[2]}a^{[1]}+b^{[2]} \Rightarrow w^{[2]} = \frac{a^{[2]} - b^{[2]}}{a^{[1]}}, so w^{[2]} = \frac{a^{[2]} - b^{[2]}}{w^{[1]}x+b^{[1]}}. Thus, a change in w^{[1]} changes w^{[2]}.
Considering that the gradient descent algorithm does simultaneous updates for all parameters, how does the interdependence among parameters argued above affect the optimization algorithm?