Some question about inception_blocks.py in Face Recognition


Why the strides of maxpooling is 2, it looks like that the output size of x_pool does not match X_3x3,X_5x5 and X_1x1???

that’s why there’s a zeropadding (l38) to match the H-W dimension

Thanks for you answers, However, it still mismatches, the dimesion of input is (m, nH, nW, nC) = (m, 3, 96, 96) . if we caculate carefully, we will find the dimension of outputs are not equivalent. Here is a example
0a4fba85085752c2020cb47550d3009



Besides, I think the operation of padding should be put in front of the max-pooling.

Actually if you go to the faceRecoModel at the bottom, you can see X is “preprocessed” before going into the inception blocks.

So you have your X_input.shape= (m, 3, 96, 96), what you can calculate is that the shape of X entering the 1st inception block (inception_block_1a) is actually (m, 192, 12, 12).

for simplicity we’ll ignore the 1st two dimensions which are m and the nb of filters (axis object to concatenation), so we can simply say “X of shape (12, 12)” when entering the 1st inception block.

Now for the output, the X_1x1 will clearly have the same shape 12x12. Then when it comes to X_pool, after maxpool stride 2 with filter size 3, it will shrink to 5x5 (=int((12 - 3)/2 + 1)), then you pad 7 in both directions to go back to 12x12.

Thanks for your brilliant answers :grinning:

I guess the imbalance of the LHS and the RHS is due to the absence of alpha. If you will add the value of alpha on the LHS of the equation, I guess that you will find that the equation is balanced

Thank you. but the alpha is small , alpha is 0.2. would you like to try it

It really confused me :thinking:

Oops, I didn’t saw the values in the 3rd image actually, and now I am a little confused too. Looks like I am the next in line, who wants to know the answer to this question.
Though, I have a small hunch. All the 3 variables, pos_dist, neg_dist and basic_loss represent the respective values for a batch of examples, i.e., all 3 of them are vectors. But the eval function is producing these values as a scalar, so, I guess the way, the eval function converts these values from a vector to a scalar might give us the answer

Thanks for your brilliant answer. However, the function tf.reduce_sum seems to return a scalar, unless we specify keep_dims=true. :thinking:

Thanks for your help again!, I got the answer. The mentor answered me. My code is incorrect indeed. we should specify axis for the variable p_dist and n_dist to make sure that it outputs a vector in stead of scalar :sweat_smile:

Well, that answers our question. I am glad you found your answer. Make sure to remove the pics if your query is resolved