Can we choose any base model to build a Siamese Network for Face Recognition?

Hello community :waving_hand:
Can we use any pretrained base deep learning model like ResNet, GoogleNet, VGG etc. to build a Siamese network?

Since these models are pretrained on different datasets and with different loss functions, will it be a good choice to pick such a pretrained model, as they aren’t trained using the Triplet loss?
I am building a Siamese Network in pytorch using Inception_v3 trained on a dataset with 1000 classes, hence it outputs a 1000-dimensional vector. I find the distance between the encodings of the anchor image and the test image and check if it is the same person. But it turns our that it’s not working well. How do we decide the minimum distance threshold?

Here is my code:

import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'inception_v3', pretrained=True)
model.eval()

from PIL import Image
from torchvision import transforms

def image_to_encoding(image_path, model):
    input_image = Image.open(image_path)
    preprocess = transforms.Compose([
    transforms.Resize(299),
    transforms.CenterCrop(299),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    input_tensor = preprocess(input_image)
    input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

    # move the input and model to GPU for speed if available
    if torch.cuda.is_available():
        input_batch = input_batch.to('cuda')
        model.to('cuda')

    with torch.no_grad():
        output = model(input_batch)

    return output
def who_is_it(test_image_encoding, database):
    """
    Implements face recognition for the office by finding who is the person on the image_path image.
    
    Arguments:
        image_path -- path to an image
        database -- database containing image encodings along with the name of the person on the image
        model -- your Inception model instance in Keras
    
    Returns:
        min_dist -- the minimum distance between image_path encoding and the encodings from the database
        identity -- string, the name prediction for the person on image_path
    """
    
    encoding =  test_image_encoding

    # Let's initialize "min_dist" to a large value, say 100
    min_dist = 100
    # Loop over the database dictionary's names and encodings.
    for (name, db_enc) in database.items():
        
        # Compute L2 distance between the target "encoding" and the current db_enc from the database. (≈ 1 line)
        dist = np.linalg.norm(encoding.squeeze().numpy() - db_enc.squeeze().numpy())

        # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
        if dist<min_dist:
            min_dist = dist
            identity = name
    ### END CODE HERE
    
    if min_dist/100 > 0.5:
        print("Not in the database.")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist/100))
        
    return min_dist, identity

Given that we want to push similar pairs closer and dissimilar pairs farther apart in the embedding space, it’s okay to start with a pre-trained model as the embedding generator and fine tune as required.

Here’s a keras example for reference.

This is a parameter that needs to be set based on the accuracy requirement of the project. For instance, a security system within a military base would require smaller threshold than a seminar attendance system.

But these models are pretrained on different datasets and with different loss functions?

Hence the need to fine-tune the base model acting as an embedding generator.

Consider the problem of binary classification of images (e.g. cat vs dog). Transfer learning using a base image model is ok even if it was trained on different images and on a multi class classification task. Please see the lecture on transfer learning on when it makes sense to use this technique.

Hello @amitsubhashchejara, as you said, the model was trained for other objective, so to change the objective to yours, you will need to fine-tune it with, for example, the triplet loss in the way taught in the lecture. The difference here is that you don’t start from scratch, and you don’t have to re-train all parameters in the model but choose to fine-tune, for example, some last (existing / new) layers. This technique, as Balaji explained, is called Transfer Learning and is covered in the DLS Course 3.

You will need to remove that final 1000-unit fully-connected layer because its objective was to make classification but you are going to change the objective.

Cheers,
Raymond

Thankyou @balaji.ambresh and @rmwkwok!

1 Like