Can we choose any base model to build a Siamese Network for Face Recognition?

amitsubhashchejara · December 17, 2025, 4:25pm

Hello community
Can we use any pretrained base deep learning model like ResNet, GoogleNet, VGG etc. to build a Siamese network?

Since these models are pretrained on different datasets and with different loss functions, will it be a good choice to pick such a pretrained model, as they aren’t trained using the Triplet loss?
I am building a Siamese Network in pytorch using Inception_v3 trained on a dataset with 1000 classes, hence it outputs a 1000-dimensional vector. I find the distance between the encodings of the anchor image and the test image and check if it is the same person. But it turns our that it’s not working well. How do we decide the minimum distance threshold?

Here is my code:

import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'inception_v3', pretrained=True)
model.eval()

from PIL import Image
from torchvision import transforms

def image_to_encoding(image_path, model):
    input_image = Image.open(image_path)
    preprocess = transforms.Compose([
    transforms.Resize(299),
    transforms.CenterCrop(299),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    input_tensor = preprocess(input_image)
    input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

    # move the input and model to GPU for speed if available
    if torch.cuda.is_available():
        input_batch = input_batch.to('cuda')
        model.to('cuda')

    with torch.no_grad():
        output = model(input_batch)

    return output

def who_is_it(test_image_encoding, database):
    """
    Implements face recognition for the office by finding who is the person on the image_path image.
    
    Arguments:
        image_path -- path to an image
        database -- database containing image encodings along with the name of the person on the image
        model -- your Inception model instance in Keras
    
    Returns:
        min_dist -- the minimum distance between image_path encoding and the encodings from the database
        identity -- string, the name prediction for the person on image_path
    """
    
    encoding =  test_image_encoding

    # Let's initialize "min_dist" to a large value, say 100
    min_dist = 100
    # Loop over the database dictionary's names and encodings.
    for (name, db_enc) in database.items():
        
        # Compute L2 distance between the target "encoding" and the current db_enc from the database. (≈ 1 line)
        dist = np.linalg.norm(encoding.squeeze().numpy() - db_enc.squeeze().numpy())

        # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
        if dist<min_dist:
            min_dist = dist
            identity = name
    ### END CODE HERE
    
    if min_dist/100 > 0.5:
        print("Not in the database.")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist/100))
        
    return min_dist, identity

balaji.ambresh · December 17, 2025, 5:10pm

Given that we want to push similar pairs closer and dissimilar pairs farther apart in the embedding space, it’s okay to start with a pre-trained model as the embedding generator and fine tune as required.

Here’s a keras example for reference.

This is a parameter that needs to be set based on the accuracy requirement of the project. For instance, a security system within a military base would require smaller threshold than a seminar attendance system.

amitsubhashchejara · December 17, 2025, 5:27pm

But these models are pretrained on different datasets and with different loss functions?

balaji.ambresh · December 17, 2025, 6:05pm

Hence the need to fine-tune the base model acting as an embedding generator.

Consider the problem of binary classification of images (e.g. cat vs dog). Transfer learning using a base image model is ok even if it was trained on different images and on a multi class classification task. Please see the lecture on transfer learning on when it makes sense to use this technique.

rmwkwok · December 18, 2025, 12:43am

Hello @amitsubhashchejara, as you said, the model was trained for other objective, so to change the objective to yours, you will need to fine-tune it with, for example, the triplet loss in the way taught in the lecture. The difference here is that you don’t start from scratch, and you don’t have to re-train all parameters in the model but choose to fine-tune, for example, some last (existing / new) layers. This technique, as Balaji explained, is called Transfer Learning and is covered in the DLS Course 3.

You will need to remove that final 1000-unit fully-connected layer because its objective was to make classification but you are going to change the objective.

Cheers,
Raymond

amitsubhashchejara · December 18, 2025, 10:34am

Thankyou @balaji.ambresh and @rmwkwok!

Topic		Replies	Views
Face recognition with siamese network and triplet loss does not learning useful patterns AI Discussions	2	76	October 29, 2023
Inception-Resnet-v1 model doesn't use the triplet loss function Convolutional Neural Networks coursera-platform	5	783	July 16, 2021
Improve Accuracy Siamese Network Advanced Computer Vision with TensorFlow computer-vision , cnn	10	96	November 17, 2024
Doubts about the Siamese Network with Triplet Loss Function Convolutional Neural Networks coursera-platform	1	531	May 20, 2022
How to apply the triplet loss function to learn a network's parameters Convolutional Neural Networks week-module-4 , coursera-platform	3	155	May 14, 2024

Can we choose any base model to build a Siamese Network for Face Recognition?

Related topics