Feedforward Neural Networks in Depth

saifkhanengr · August 7, 2023, 10:30am

Summation means summing all the terms. You highlighted the addition sign (+) in the second figure. All terms are adding starting from i = 1 to i = m.

JJaassoonn · August 7, 2023, 10:38am

Dear Mr Saif,

May i know why should we add all terms together in this case?

Thank you.

saifkhanengr · August 7, 2023, 11:25am

First, give us the background. What is the relationship of u_{i} and g_{i}(x_{1},...,x_{j},...,x_{n}). Addition? And, y_{k} & f_{k}(u_{1},...,u_{i},...,u_{m}).

Amir_Momenan · August 18, 2023, 9:24am

Thank you for sharing, Your notes were very organized and useful. As an MSc student I understood most of it.
For calculating backward propagation equations, However, I prefer writing the chain rule wholly all the way from cost function to the target . For example, instead of writing 4 chain rules with the length of 1, I find it simpler to write a single line chain rule with the length of 4. It’s basically the same thing but I find the second formation easier to understand.
Just trying to throw a suggestion out there. Thank you again for the amazing notes.

Ernesto_Cuartas_Mora · August 18, 2023, 11:00pm

U must calculate dZ/dW from eq (3), Z[l] = W[l]A[l-1]+B[l]. Where dZ/dW = A[l-1]

Omar_Mostafa1 · September 2, 2023, 10:21am

I am facing a problem with the articles.
the mathematical equations appear like this:

aalu · September 15, 2023, 10:15pm

Hi, could you clarify the notation for f in (9) in part 1? I was expecting f to be f:R^{n^L x m} x R^{n^L x m} → R, but it is f:R^2n^L → R

Bob_Levy · September 22, 2023, 11:19am

Any idea why the MathML embedded in the neural net SVGs here would fail to render? I get the same issue in Chrome, Firefox and MS Edge,

peppy · October 13, 2023, 12:10am

I was confused by this when I first saw this as well. I found this article that might be helpful:

(see section titled " The Generalized Chain Rule")

kiran_kumar_Savvana · October 13, 2023, 1:28am

Thanks for wonderful work.

rmwkwok · October 13, 2023, 1:38am

Thank you for sharing, @peppy

manaviitd · November 8, 2023, 5:59am

Thanks for the derivation. I initially thought after reading part 1 article that equation #17 was an overkill given that for any layer l, any unit j in that later, any training example i, the activation function a_j^[l](i) would be a function of only z_j^[\l](i) and not any other a_p^[l](i). However after reading part 2 article and in particular the softmax function, I understood your motivation behind equation 17. Thanks much!

Shivang_Shekhar1 · March 19, 2024, 12:30pm

Coolest thing I have seen today

Francis60 · April 5, 2024, 4:21pm

Hi,

I asked myself the same question and even send a personal e-mail to @jonaslalin about it before i read this conversation.
Thank for the explantation @rmwkwok .

Apologies for cluttering his mailbox.

Francis

Francis60 · April 5, 2024, 5:04pm

No, but you can save as html. there are tools to parse html to pdf but it is touchy.

ateebai · April 16, 2024, 10:16am

Hope it Helps:

nitesh_garg · June 25, 2024, 9:44am

Hi, I had a doubt regarding the cost function of multi-class classification. Doesn’t this J give higher priority to classes having higher value as it is multiplied by y and have log(a[L])? Does it work only because of soft-max activation function as if we use any other function, its derivative with respect to Z[L] may not come out to be 1/m (A[L] - Y) ?

Prashant360 · August 12, 2024, 1:17pm

paulinpaloalto · August 12, 2024, 2:55pm

The author Jonas Slalin is the real expert here, but I’m not sure whether he is still answering questions here. If you don’t get an answer from him, you could check his website (where this material is posted) and see if he has a link there for posting questions.

Are you asking why the domain of function f is \mathbb{R}^{2n^{[L]}} as opposed to \mathbb{R}^{n^{[L]}} or are you asking what n^{[L]} means there?

If the latter, it is the number of neurons in the output layer of the network.

For the former question, my reading would be that the point is that both A^{[L]} and Y have dimension n^{[L]} x m. So I would have thought the m would need to figure in there. I haven’t read the rest of Jonas’s definitions in a couple of years, but it looks like he is using the same convention that Prof Andrew Ng does that J is the scalar cost function which is the average of the vector loss values L across the samples. So I would have thought that you needed to incorporate the m into the dimension of the domain, but maybe since taking the average is basically trivial we can ignore that. Then we’d be down to just the discretionary part of the function being the vector loss which has 2n^{[L]} inputs.

But all this is just notation in any case. When you define the actual functions in question, this should all be clear and will just “come out in the wash”.

ai_curious · August 12, 2024, 4:20pm

His profile indicates last post date was May 2022

Topic		Replies	Views
Back Prop question Advanced Learning Algorithms week-module-2	54	240	May 27, 2025
Backward propagation derivation Neural Networks and Deep Learning week-module-3 , coursera-platform	23	328	March 1, 2025
W3_Vectorization of dZ[2] equations Neural Networks and Deep Learning coursera-platform	5	576	March 31, 2023
Typo in back prop formula (week3 and week 4) Neural Networks and Deep Learning coursera-platform	7	759	December 10, 2021
Deep learning from a mathematical view Neural Networks and Deep Learning coursera-platform	2	675	November 27, 2021

Feedforward Neural Networks in Depth

Related topics