Also, shouldn’t Week 4 come before the exercise of Week 3, to give the student additional perspective and answer questions that might arise in Week 3? But then the programming assignment of Week 4 would have to come directly after the programming assignment of Week 3.
Here comes perspective again. Yes - you are right, if we go general to just any loss function, we don’t know the exact form for dZ^{[L]}. However, if we look at just the binary classification problem which is covered by the first course and even the regression problem (linear activation + squared loss), we will have the same form for dZ^{[L]}.
I think, when you were considering this question, your scope, perhaps, might have already been larger than what the course had covered. IMHO, I think it is good for you because you have a big picture in mind and I think you will be cautious at what to use when you make your own network. However, it is just like a context of a conversation, when we are discussing shopping apples, then when I say $5, then it should imply that is the price of apples and not something else. I mean, here there are two parallel contexts - the course’s and yours (or mine as a learner).
Is there anything being tested/asked in week 3 that’s covered only in week 4? Because, you know, we are building up knowledge and I don’t see any problem for week 4 to ask anything taught in week 1 through week 3. In fact, in course 2-5, we might use things in course 1 from time to time.
But aren’t the question shuffled? They definitely change between trials.
Also, concerning
Is there anything being tested/asked in week 3 that’s covered only in week 4?
Not directly, but the “box structure” of the backward propagation is explained in the week 4 lecture, and it would be useful to know about it when doing the programming assignment of week 3. Just my opinion.