Question about Contrastive Representation Learning in 1st lab

lxd_1986001 · July 28, 2024, 1:57pm

About contrastive representation learning in first lab, as in description, the learning is to train the model to pull closer vectors from the anchor for similar example and push further vectors from the anchor for different concepts.

But in the notebook, there is only anchor (original image) and contrastive, and the model is trained to minimize the cosine similarity between the anchor and contrastive, is it correct?

egonrp · September 26, 2024, 11:29am

Hi! The Python notebook in the JupyterLab import the mnist_dataset.py file to prepare the dataset with anchor image data, constractive image data, label and distance.
The positive image data receive the distance equals 1.0 and the negative -1.0 for same label of anchor image data.
The training dataset is filled something like that:
label,anchor-image-data,constrative-image-data,distance
0,…,…,1.0
0,…,…,-1.0
1,…,…,1.0
1,…,…,-1.0
2,…,…,1.0
etc,…
The constrative image data when contains a image from same class (same label) has a positive distance and a negative distance for images of any other class.
During the training loss calculation the cosine similarity function is called receiving the anchor and constrative images data inferred by the model generating a score that compared with the target distance resulting in the training loss (MSE loss for the current similarity score and expected distance value) for that sample.
The objective of training is to reduce the loss to a value closer to zero. If the training was successful, we will have a loss graph (y=loss, x=epoch) where it reduces until it stabilizes and is tangential to the x-axis straight line. See bellow:

Thus, at the end of training, images from the same class tend to be grouped together while distancing themselves from images of other classes when dimensionality reduction is applied to the model inference results (using the PCA technique, for example).

egonrp · September 26, 2024, 1:15pm

During training, the model is adjusted to classify images from the same class/label as similar (contrastive positive) and from different classes as distinct (contrastive negative) from the anchored image.
The cosine similarity function is used to generate a score for the inference result of the two images submitted to the training model.
The “distance” field of the training dataset has a value of 1.0 for images from the same class as the anchored image and a value of -1.0 for images from other classes.
The images are encoded as vectors and therefore images of the same class have similar characteristics within the dimensions of the vector.
So, at the end of training, images from the same class tend to group together, distancing themselves from images from other classes when dimensionality reduction is applied to the model inference results (using the PCA technique, for example). See bellow:

Topic		Replies	Views
Training Data set for Triplet Loss Convolutional Neural Networks	1	508	June 15, 2022
C1W2 -> understand the constrastive loss function Custom Models, Layers and Loss Functions with TF week-1	2	545	September 14, 2022
Triplet Loss function for the Siamese network Convolutional Neural Networks	4	635	July 22, 2022
Question about the contrastive loss function? Custom Models, Layers and Loss Functions with TF week-2	3	547	May 30, 2022
Siamese Model Finds Middle Distance AI Discussions	2	41	October 17, 2022

Question about Contrastive Representation Learning in 1st lab

Related topics