DLS4-Week2-quiz

Shuwen1 · June 21, 2024, 2:14pm

When creating a post, please add:

Week # must be added in the tags option of the post.
Not sure why I can’t add week tag. but this is for week2 quiz
Link to the classroom item you are referring to:
Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera
Description (include relevant info but please do not post solution code or your entire notebook)

I don’t know how to calculate MobileNet v2 Bottleneck block parameter counts. Did I miss any formula in lecture ? here is the example from slide.

we have n * n * 3 volume, 18 filters for expansion, depthwise conv we use 3 * 3 filters, then 18 filters for pointwise

Thanks in advance

tarunsaxena1000 · June 24, 2024, 9:53pm

You need to remove this part when you post next time.

tarunsaxena1000 · June 24, 2024, 10:58pm

There is no well-defined formula taught in the lectures, you can calculate it intuitively.
In expansion layer - 1x1x3 x18(expansion filter) + 2 x 18 (Batch Normalization’s 2 parameters gamma and beta for shifting and scaling)
same we do for the next layer-
In depthwise conv - (3 x 3 x 18) x18 + 2 x 18
In pointwise - (1 x 1 x 18) x output channels + 2 x output channels.

(in relu layer there is no learnable parameters)

Sometimes only two BN is used i.e. after depthwise conv and pointwise.
Whereas in some transfer learning models only one BN is used i.e. after depthwise conv.
I think in research paper all three BN are used, so calculate according to your practical model.

Take layer order from below->

def bottleneck_block(x, expand=18, squeeze=n_out):
  m = Conv2D(expand, (1,1))(x)
  m = BatchNormalization()(m)
  m = Activation('relu6')(m)
  m = DepthwiseConv2D((3,3))(m)
  m = BatchNormalization()(m)
  m = Activation('relu6')(m)
  m = Conv2D(squeeze, (1,1))(m)
  m = BatchNormalization()(m)
  return Add()([m, x])

It is my first answer in the community, let me know if this was helpfull.

Shuwen1 · June 26, 2024, 2:13pm

Thanks @tarunsaxena1000 for replying. I have some questions. Why BatchNorm has 2 parameter ? (I kind of forget what BatchNorm does) Why depthwise conv second term multiply by 28 ? in last step, why do we have 2 x output channels ?

To be honest, I am still confused about how the whole thing works. Thanks for the code. Didn’t know “expand” step is a Conv2D layer. I guess I will go rewatch lecture videos and some other video on youtube to get better understanding of MobileNetv2

tarunsaxena1000 · June 30, 2024, 4:07pm

Batch Normalization provides several benefits:

Faster Convergence: By reducing internal covariate shift, Batch Normalization allows for faster training by stabilizing and accelerating the learning process.
Regularization: Acts as a regularizer by adding noise to the activations of each layer, which reduces the need for Dropout and helps prevent overfitting.
Improved Gradient Flow: Normalizing the activations ensures that gradients propagated during backpropagation are more stable and leads to faster convergence.

remember we studied aboud gradient convergence from unnormalized data look like below->

to the normalized data (smooth descent)->

For detailed understanding of batch normalization, you can refer- https://towardsdatascience.com/batch-norm-explained-visually-how-it-works-and-why-neural-networks-need-it-b18919692739

Without γ(gamma) and β(beta), the normalized output will always have zero mean and unit variance. This constraint could limit the ability of the network to represent complex functions, as it forces all outputs to follow this fixed distribution.
By introducing γ(gamma) and β(beta), the network can learn to adjust the mean and variance of the normalized activations, thus restoring the capacity to represent a wide range of functions.

it should be 18, sorry it was a typing error on my part. Thanks for pointing it out.

Channel-wise Normalization

Batch Normalization normalizes the activations of each channel independently across the mini-batch. Here’s how it works:

For each channel 𝑐, BN computes the mean 𝜇𝑐 and standard deviation 𝜎𝑐 across the mini-batch.(These are non-trainable parameters)
It then scales and shifts the normalized values using learned parameters (𝛾𝑐 and 𝛽𝑐), where 𝛾𝑐 scales the normalized value and 𝛽𝑐 shifts it. (these are trainable parameters)

Parameters for Batch Normalization

The parameters for Batch Normalization are:

Scale Parameter (γc): One per channel. It’s a learnable parameter that scales the normalized value.
Shift Parameter (βc): One per channel. It’s a learnable parameter that shifts the normalized value.

thats why we do 2 x output channels

Kindly mark it as the ‘solution’ if it answers your questions. let me know if you have any other queries.

Topic		Replies	Views
C4 W2 Quiz Question10 How many parameters Convolutional Neural Networks coursera-platform	3	557	November 4, 2022
In inception network week 2 in cnn course Convolutional Neural Networks week-module-2 , coursera-platform	4	43	July 5, 2024
Week2 Quiz MobileNet Convolutional Neural Networks coursera-platform	5	1146	September 13, 2024
Week 1 Assignment 2 parameters Convolutional Neural Networks coursera-platform	1	530	August 10, 2021
Course 4: Week 1: Reading: CNN Example CORRECTION Convolutional Neural Networks coursera-platform	4	1072	June 22, 2022

DLS4-Week2-quiz

Channel-wise Normalization

Parameters for Batch Normalization

Related topics