Week_2 _Assignment_1

When creating a post, please add:

  • #Week_2
  • #Coding_Assignment_1
  • I have a question in deploying the second type of ResNet blocks
  • “The convolutional Block” Why the conv layer is learned layer not just an identity matrix to resize the input so to match the output layer dimensions?
  • And why they add the batch norm step after it?

Hello @Yasmeen_Asaad_Azazi! It’s been a while since your last post, and welcome back!

Let’s go to discussion mode and I will be brief.

An identity matrix can’t change anything, can it? Because A I = A where I is the identity matrix. If we go a step further from “identity matrix” to a “constant matrix”, then we need to decide which constant matrix to use. In contrast, by having it trainable, we leave the decision to the learning algorithm. I think, in neural network, unless we have a good reason not to, we make each parameter trainable.

In case I didn’t get you, what is your reason to not use a trainable conv layer :wink: ?

As for batch normalization, if the conv layer is trainable, then we use batch norm to safe guard the learning, which, to me, is the same reason for having batchnorm in other places. As for why batchnorm at all, then I would recommend the lecture!

Cheers,
Raymond

1 Like

Thank you so much, yes i got confused between identity matrix and constant matrix, thank you again for understanding my question, i was thinking it might help making the computation less complex.

1 Like

No problem! Sometimes, even though I can guess the real meaning from your words, I would still need to first stick to the formal or the universal definition (of identity matrix) before moving on to my guess :wink: I think this is good for everyone :wink: Thanks for your understanding, too!

Indeed keeping some parameters untrainable can save computational time if we have prior knowledge on what good values to fix those parameters to. In fact, this happens all the time when we do “transfer learning” (which is a DLS topic, too). Your idea is definitely a very well-used technique, only that, here, when we train a model from scratch, it is better to give more freedom to the learning algorithm, so that it can tell us what the good values are.

Cheers,
Raymond

1 Like