Week2 Assignment1 Contradiction with ResNet Paper

Hi all

the first paragraphs of the assignment is in contradiction with the ResNet Paper.

“We argue that this optimization difficulty is unlikely to
be caused by vanishing gradients. These plain networks are
trained with BN (Batch Normalization), which ensures forward propagated
signals to have non-zero variances. We also verify that the
backward propagated gradients exhibit healthy norms with
BN. So neither forward nor backward signals vanish.”

So if the problem is vanishing/exploding Gradients, it would be more logical to use better initialization methods(As Prof. Andrew mentions ) and also Batch normalization.

But the problem to which ResNet is a solution is something else.

So in this view, the first paragraphs of the assignment are a bit in contradiction with the source papers.

Thanks alot

Hey @saiman,
Apologies for the delayed response. It indeed is a great question. I have solved the assignment and read the paper myself, but didn’t connect the points that you have in your query. It even made me think for a considerable portion of time, before I stumped upon this great thread on StackExchange. The question in this thread is different from what you have asked, but the answer is very well aligned with your query. I hope this helps.