# Face recognition - Understanding img_to_encoding code

In section 5.1. - Face Verification, the following function is defined:

``````#tf.keras.backend.set_image_data_format('channels_last')
def img_to_encoding(image_path, model):
img = np.around(np.array(img) / 255.0, decimals=12)
x_train = np.expand_dims(img, axis=0)
embedding = model.predict_on_batch(x_train)
return embedding / np.linalg.norm(embedding, ord=2)
``````

I would appreciate if someone could explain me the code above, i.e.:

1. What is the commented code doing in the 1st line?
2. Any special reason for the around method to be used with 12decimals?
3. Although I know what `expand_dims` does, why is this used to â€śre-calculateâ€ť img?
4. Why is the predicted image (embedding) divided by its L2-norm to produce the result of the function?

Many thanks for the help.

The point is that there are two standard orientations for image tensors:

â€śchannels lastâ€ť, which means m x h x w x c

â€śchannels firstâ€ť, which means c x h x w x m

In these courses we always use â€śchannels lastâ€ť mode, which I think is the default for most of the TF functions, which is probably why they comment that line out.

I donâ€™t know the real answer there. Itâ€™s certainly plausible that youâ€™re wasting your time with more than 12 bits of resolution in the mantissa. Color values only have 255 choices per pixel, right? But I would have thought the point is that you want to be doing 32 bit FP arithmetic instead of 64 bit to save memory and compute cost. The clearer way to do that would be to use `tf.cast` to `float32`.

The point is that we are feeding a single image to the model, but the model is trained on 4D input tensors, right? It is defined to take batches of images as input. So we have to convert our single image (a 3D tensor) to look like a â€śbatch of oneâ€ť, which is the purpose of that `expand_dims` call.

This one is interesting. I would have thought that the definition of an embedding is that itâ€™s a unit vector. So Iâ€™d expect the model to emit embeddings with length one. But if youâ€™re not sure, then it does no harm to divide by the norm, right? Worst case you just wasted a bit of compute to create the same answer. You can try adding an extra line before the return to print the L2 norm of the resultant embedding and maybe youâ€™ll find that weâ€™re wrong and the model does not normalize them for us.

I added a print statement in the `img_to_encoding` function to print the norm of each embedding value returned by the model and it turns out they are not unit length:

``````norm = 9.145319938659668
norm = 9.275681495666504
norm = 6.319966793060303
norm = 7.333863735198975
norm = 6.1986918449401855
norm = 3.8730342388153076
norm = 3.8907790184020996
norm = 4.685695648193359
norm = 11.39218807220459
norm = 7.841488838195801
norm = 3.648789167404175
norm = 5.053682327270508
``````

So that normalization step of dividing by the norm is required to produce unit length embeddings.

1 Like

Thanks Paul for your further input! Much appreciated.