C4 week4 : Face_Recognition

i get this error

i can’t figure out why and this is my code
image

The error message is telling you that you can’t just use the “-” operator do the subtract in this case. Try using tf.subtract. That worked for me …

Note that if anchor, positive and negative are actual tf tensors, then the “-” will work, but the test cases here are structured to use lists in some cases. Apparently tf.subtract is smart enough to cope with this.

1 Like


Excuse me! why the stride in X_pool is 2 ??? the output size of X_pool matches that of X_3x3, X_5x5 ??

It is a good question, but I had not looked at the internals of the FaceNet model before. They just give it to us. Have you looked at the paper to see if they explain this? Note that by default on pooling layers, the stride is the same as the poolsize, but that is only the default: there’s no rule that says you have to use the same stride.

If the stride is greater, then you’re literally ignoring inputs, but if the stride is less then you’re overlapping the max pooling regions which would attenuate the effect and reduce the size of the output less.

If you think about it for a second, you’ll realize that the formula for computing the size of the output of a pooling layer is the same as for a conv layer (gosh, I sure wish we had LaTeX here hint hint course staff :nerd_face:), it’s just that we normally see the case in which s == f and p == 0:

n_out = floor((n_in + 2p - f)/s) + 1

I’m a little confused with what exactly it means to calculate the distance between the image_oath and identity. Is it just the L2 norm of encoding?
Here is my code:
Step 2: Compute distance with identity’s image (≈ 1 line)
dist = np.linalg.norm(encoding,ord=2)

They give you the function img_to_encoding, right? Take a look at the output of that function: the result is a (normalized) 1 x 128 vector of the encoding of that image (the output of the model for that image). Now the question is: you’ve got two of those encodings and you want some metric for how far apart they are. So you take the vector difference and then you compute the square of the L2 norm of that difference vector, which is a representation of the length of that vector.

This is all explained in the notebook. I suggest you go get some fresh air to clear your mind and then come back and just read the instructions in the notebook again. They do a good job of explaining everything.

2 Likes

I thank the mentor for the valuable comments on my question. The FaceNet model utilizes inception network as tool. The inception network has to make sure that the output of different filters have the same size so that we can concatante different outputs. I show a example here, as you can see, I input a random tensor (1396*96) to the function inception_block and it will give an error for mismatch problem!

0a4fba85085752c2020cb47550d3009



if we check the internal of inception_block, we can see that the output size of 3*3 fiters is the same as that of 5 *5 fiters, because the former one uses 1 padding and the latter one use 2 padding so that the dimension matches. However, the output size of x_pool does not match them, which is illustrated from my example. In the couse, Andrew also point out that the output size of max pooling in inception should match with each other.

I would appreciate it that if you can also check the internal of the FaceNet. I once pointed out some error in week1 course4 and the instructor of this course praised me. I thank the team for the supporting so excelent homework for us, However, there maybe are some errors in detail. Therefore, I beg you again , could you please help me to check the detail and I urge to konw whether I am right in this problem.

Sincerely

For anyone else who sees this, this question was answered by @reinoudbosch on this other thread about the same topic.

I’m stuck here. The instruction says not to “compute” the square. Is that a new change? The difference between the np.linalg.norm(encoding) and np.linalg.norm(database[identity]) seems very small. What am I missing?

Sorry, I’m not sure what you are asking. Can you please be a bit more specific about which section you are talking about and what your question is? In the triplet_loss section, they are using the square of the L2 norms, so squaring is obviously involved. In verify and who_is_it, they are just using the norms “as is”.

I think I don’t know how to apply linalg.norm() for “vector” differences. I’m not familiar with Python, and having trouble calculating the difference. Given pair of 1x128 vectors, how is linalg.norm used? I tried to read the language manual, but I’m not getting it.

Never mind. Stupid me… should be np.linalg.norm(a-b). I tried every other norm(a, b), norm(a) - norm(b), norm(a, b, axis=1), etc.

Right! In python and numpy, if a and b are np arrays, then a - b is also a numpy array of the same shape. np.linalg.norm takes any array as an argument.