Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations

In the Deep Learning Specialization,

Course 1 Neural Networks and Deep Learning,

Week 4 Deep Neural Networks,

There are some optional readings (external articles) titled “Feedforward Neural Networks in Depth”.

In part 1 ( Feedforward Neural Networks in Depth, Part 1: Forward and Backward Propagations | I, Deep Learning ), couldn’t the student have taken a clearer approach when writing the paper (see 2 examples below)?

My math is rusty, and I am not a theoretical mathematician, but because the topic is difficult to understand, I was hoping for a clearer explanation. I have a headache just thinking about the dimensions of the vectors and corresponding matrices.

Well, that link is “optional” for a reason. It either helps you or it doesn’t. Note that it’s provided by Jonas Slalin, who was a mentor for DLS a few years back, but it is not part of the actual course here. You can try contacting Jonas through his website (where that link points).

But I took a look and will try to at least give some thoughts on your questions above:

Did you check the “Notation” section above that figure? k is not a really a “layer” number. It’s the coordinate of the “inner” dimension of the dot product that is happening there. In Professor Ng’s fully vectorized notation, the dot product there is W^{[l]} \cdot A^{[l-1]}. So k is a column number for the current layer and a row number of the previous layer.

On the second point about z should not be vectorized for i, I must be missing your point. It’s not vectorized over the samples dimension in Jonas’s expression of that computation. All the expressions you show there are for a single value of i. It would be more efficient if it were vectorized, which is why Professor Ng ends up expressing everything in fully vectorized form:

Z^{[l]} = W^{[l]} \cdot A^{[l-1]} + b^{[l]}

Jonas is just trying to express everything in full detail at the lowest level. Maybe that’s not the easiest way to get a clear picture. You can end up sort of drowning in the notation. :nerd_face:

Paul, thank you so much for your input. Always appreciated!

Hi, Carlos.

I’m glad it was useful. Maybe it’s also worth saying that I am not in any way criticizing Jonas’s work there. I’m totally in awe of the quality of the work he did and the amount of effort it must have taken to really express all the math that is involved in these computations. I remember talking with him at the time he was working on that and he recognized that Professor Ng had chosen for his own perfectly valid pedagogical reasons to target the courses in a way that did not require all students to have at least undergrad level math proficiency in order to learn the material. As a result, the course material in a lot of cases just presents us with the math formulas we need with some intuitive explanations, but without really deriving things. Jonas wanted to provide the full mathematics for people who did have the math background to understand it and wanted to “dig deeper”. He did a great job at that, but the results take some serious effort to absorb.

There is plenty of really interesting material ahead in DLS, so I hope you will find it as worthwhile as I did.

Best regards,
Paul

Paul, thanks again. Yes, I am in that process of absorbion, and it’s not so easy. I expect to see the light is some weeks