I’m building an application where I have two pictures of a board taken from two different cameras. My goal is to determine whether the board in both pictures is the same. In my dataset, there are 750 image pairs, and my dataloader is set up to create 750 pairs labeled as 1 (same board) and 750 pairs labeled as 0 (different board). The images are RGB, but I convert them to grayscale.
However, regardless of the neural network architecture, batch sizes, or learning rates I try, the network consistently predicts a value close to 0.5 for any given pair, which results in a loss of around 0.25. Below is the architecture of the neural network I’m using.
Could it be that this task is too complicated for a neural network to handle? I’m open to any suggestions, whether it be code, articles, or books.
Image example of the same board taken with both cameras.
If we do not consider the NNs, how can one distinguish these two boards by seeing these images? Should one look at the thickness, and wood patterns on it?
Do you apply any augmentations during training?
Have you tried classical image processing methods to differentiate images?
If possible, share train vs val learning curves.
For the given problem, your fully connected network seems very wide. maybe try reducing the number of neurons in the linear layers.
I used CNNs a while ago for categorizing retail shop images as shoes or umbrellas etc. it used not more than a few layers, i think 2-3 layers maybe. This one should work with a simpler network.