Course 4, week 2 quiz, #7

In one of the versions of Q7 of Week 2’s quiz, the question asks us to pick the options that are true about the inception network and asks to check all options that apply. One of the options says “Making an inception network deeper won’t hurt the training set performance”. I selected that but the grader says it’s wrong for reason as “Incorrect. As seen in the lectures , in practice when stacking more layer, the training performance might start increasing instead of decreasing.”

I’m confused. Isn’t training performance increaseing good for the network? The option says “won’t hurt” the training set performance. Isn’t it exactly the same as the solution which says “training performance might start increasing”? The grader seems to be conflicting with the answers it provides. Shouldn’t this option be included as part of the right options?

No, “increased training performance” often means overfitting. This gives worse performance on new data.

But the question here is only talking about the training set performance, and not worrying about corss validation set being low accuracy and therefore the issue of overfitting. It’s not asking about overfitting or if a much better training performance is not necessarily good for the network. It’s only talking about the training set. That’s all.

Similar argument can be made in the lecture videos when Andrew talks about how a deep ResNet is able to keep reducing the training set error without suffering from the problems plain networks are having that deeper networks may result in poorer training performance. Also in assignment 1, the exercise instruction says “This means that you can stack on additional ResNet blocks with little risk of harming training set performance.” Essentially they’re all saying, the deeper the network, the better the training performance, and therefore not hurting the training set performance as suggested in the quiz questions.

I believe the grader needs to be corrected.

Hello @zhexingli!

Kindly check this conversation on a similar topic.

Best,
Saif.

Yes, the point made on that thread the Saif points out is the same as what is being said in that quote from the lecture: note that “little risk” of something doesn’t mean it never happens, right? The question is asserting that it always helps and can’t possibly hurt. There’s essentially nothing in ML that always works in every case, right?

Right, I understand that there’s nothing in DL that’s always guaranteed. But the explanation of the grader just makes this a whole lot more confusing. Because it appears to me that the explanation is contradictory to the option that it’s marking as wrong. You know what I mean? If there’re cases when stacking more block would decrease the performance, then why does the grader explanation say it might increase the performance rather than just say in some cases it might decrease performance?

One way or another, I believe either the question or the grader explanation needs to be reworded to avoid confusion.