Slight confusion in building the Cost function of a DNN


Below is an image of the Cost function in the programming assignment of Course 1, Week 4 of the Deep Learning specialisation.

I was confused about the AL and Y notation. Don’t the capital letters denote all the layers in the Neural Network? If this is the case, why are we summing the cost when we are already calculating it for the ultimate activation (AL or yhat)?

{moderator edit - solution code removed}


The convention that Prof Ng uses is that capital letters represent matrices with multiple samples. When he means a single sample or result from a single sample, he uses lower case letters. So AL is the activation output of the final layer of the network, so it is a 1 x m vector, where m is the number of input “samples”. Then Y is the corresponding label values for those same samples. That means Y is also a 1 x m vector. This is independent of how many layers there are in the network: when we are computing the cost, it is only based on the output of the final layer of the network. Of course those values depend on what happens in the previous layers, but that is history by the time you get to the point of computing the cost.

BTW you filed this under General Discussion. You do say in the post that it’s about Week 4 of DLS Course 1, but you’ll have better luck getting responses in general if you file things in the relevant categories. I’ll move the thread for you to the right category (using the little “edit pencil” on the title). These forums cover a lot of different courses and specializations at this point. :nerd_face: