Gradient decent in a multi layered neural network

In a multi layered neural network is each layer optimized separately and in sequence ie the cost function in minimized a layer at a time or is all done at once across all layers

Hello @Paddy
The optimization is a simultaneous and iterative process across all layers of the neural network rather than optimizing each layer separately and sequentially.

1 Like

I replayed the lecture ’ forward prorogation ’ it looks to me like its sequential because each layer is taking input from the previous layer. If I look at teh coffee roasting example the layers are set up via the sequential model in tensorflow. so X goes into the 1st layer , its activation functions run and produces the output to be consumed by the next layer ie the X input is now replaced by the output from the 1st layer. BUT when I look at the lecture on week 2 training detail loss and cost functions it says
’ The cost function is a function of all the parameters into neural network.You can think of capital W as including W1, W2, W3.All the W parameters and the entire new network and be
as including b1, b2, and b3.If you are optimizing the cost function respect to w and b,if we tried to optimize it with respect to all of the parameters in the neural network’
This sounds like it all done at once.

The data flow in both directions is sequential (forward and backward) and iterative.

But the training to learn the best weights applies to all layers. There is no separate optimization for each layer.

What we’re optimizing is the minimum cost measured at the output layer.