Question about Contrastive Representation Learning in 1st lab

About contrastive representation learning in first lab, as in description, the learning is to train the model to pull closer vectors from the anchor for similar example and push further vectors from the anchor for different concepts.


But in the notebook, there is only anchor (original image) and contrastive, and the model is trained to minimize the cosine similarity between the anchor and contrastive, is it correct?

Hi! The Python notebook in the JupyterLab import the mnist_dataset.py file to prepare the dataset with anchor image data, constractive image data, label and distance.
The positive image data receive the distance equals 1.0 and the negative -1.0 for same label of anchor image data.
The training dataset is filled something like that:
label,anchor-image-data,constrative-image-data,distance
0,…,…,1.0
0,…,…,-1.0
1,…,…,1.0
1,…,…,-1.0
2,…,…,1.0
etc,…
The constrative image data when contains a image from same class (same label) has a positive distance and a negative distance for images of any other class.
During the training loss calculation the cosine similarity function is called receiving the anchor and constrative images data inferred by the model generating a score that compared with the target distance resulting in the training loss (MSE loss for the current similarity score and expected distance value) for that sample.
The objective of training is to reduce the loss to a value closer to zero. If the training was successful, we will have a loss graph (y=loss, x=epoch) where it reduces until it stabilizes and is tangential to the x-axis straight line. See bellow:


Thus, at the end of training, images from the same class tend to be grouped together while distancing themselves from images of other classes when dimensionality reduction is applied to the model inference results (using the PCA technique, for example).

During training, the model is adjusted to classify images from the same class/label as similar (contrastive positive) and from different classes as distinct (contrastive negative) from the anchored image.
The cosine similarity function is used to generate a score for the inference result of the two images submitted to the training model.
The “distance” field of the training dataset has a value of 1.0 for images from the same class as the anchored image and a value of -1.0 for images from other classes.
The images are encoded as vectors and therefore images of the same class have similar characteristics within the dimensions of the vector.
So, at the end of training, images from the same class tend to group together, distancing themselves from images from other classes when dimensionality reduction is applied to the model inference results (using the PCA technique, for example). See bellow: