Week 4 - Forward Propagation in a Deep Network - Why do we need the for loop to calculate Zs and As

This might be a silly question but why do we need a for loop for each layer? Why don’t we write down all the equations for each layer like in the slide?


How many layers are there in your network? That’s the point. If you knew that you always had a fixed number, then you could just write them out explicitly. Of course if the number is > 3, that’s going to be more code than just writing a loop. But here are a couple of points to keep in mind:

  1. Networks typically have way more than 3 or 4 layers. What we are seeing here in course 1 are essentially simple “toy” examples. You’ll see more realistic models when you get further along in Course 2 and Course 4.

  2. The number of layers is not fixed: different problems require different network architectures. Our goal is to write general code that works in all cases: we can just tell it the number of layers and the number of neurons in each layer as arguments to the function and we don’t have to rewrite the core forward and backward propagation code each time.

Hi, @Marios_Constantinou. Not silly at all – fundamental, really! If I understand your question correctly, you ask why not combine all of the individual layers to form one mega-function by repeated substitution? Grab a pencil and your paper pad ahead and go ahead. At the end of the exercise you’ll have an expression for that function, let’s call it \Lambda(X) (for large!). The you can just minimize the cost function over all of the parameters in this function using some numerical optimizer (based on gradient descent).

Now make up some numbers for the dimensions of X, the number of layers L, the number of nodes in each layers, etc, and compute the number of parameters this function, i.e. the number of elements in all of the combined W^{[l]}'s and b^{[l]}'s. (E.g. 4 layers, 7 nodes per hidden layer, with X a 12,288 dimensional vector. Huge number, right? (For a realistically-sized network at least.) Now find a computer that can do the required optimization before 2050. Expensive! In the lingo of mathematics and computer science, you have encountered the “curse of dimensionality.” There be dragons there!

And now (for the second time today) I get to say that forward/backward propagation is the fundamental technology that makes deep learning possible. :nerd_face:

And if I misunderstood your question, I just noticed that @paulinpaloalto weighed-in in the meantime!

Oh yeah, I took the video example too literary. What you said makes sense, thanks!

Yeah @paulinpaloalto covered my question perfectly but your explanation was also insightful because I never thought of it that way, thank you for the response!