We did not go in great depth into pre-processing steps involved in computer vision in the course and either divided pixels by 255 or used pre-processing within the model function. In the first three course and Andrew’s original Machine Learning course, we frequently used normalisation where we substracted the mean and then divided non-image data by the standard deviation.
Is this step common in image pre-processing? Reading the naive face verification, it seems that if we substracted the mean and divided by standard deviation we would ignore the same issue differences in “lighting, orientation of the person’s face, minor changes in head position, and so on”
Is normalisation common for image processing? Does anyone use it in real life?
Summary
2 - Naive Face Verification
In Face Verification, you’re given two images and you have to determine if they are of the same person. The simplest way to do this is to compare the two images pixel-by-pixel. If the distance between the raw images is below a chosen threshold, it may be the same person!
Figure 1
Of course, this algorithm performs poorly, since the pixel values change dramatically due to variations in lighting, orientation of the person’s face, minor changes in head position, and so on.
You’ll see that rather than using the raw image, you can learn an encoding, 𝑓(𝑖𝑚𝑔)f(img).
By using an encoding for each image, an element-wise comparison produces a more accurate judgement as to whether two pictures are of the same person.
Typically, no, because the features are all pixel values, so they’re already fairly well normalized.
If you go back to the original Machine Learning course, where pixel values were used in the digit recognition assignments (ex3 and ex4), you’ll see that normalization was not used.
I suppose if we convert the image into PCA format, the output can be viewed as individual features which is more meaningful to normalise compared to individual pixels in a raw image. Thank you for the response!
The embeddings are the output of the model and they are normalized so that they each have a 2-norm of one, meaning they are unit vectors (the 2-norm is just the Euclidean length of a vector). I’m not sure why they do that, but perhaps it makes the comparisons between different embeddings more predictable.