When I was studying the Convolution implementation of sliding windows, Andrew Mentioned that the train dataset is 14*14*3 and the test dataset is 16*16*3. I wonder if the dataset must have the same shape when doing object detection. Why could data with different shapes be applied in the Convolution implementation of sliding windows? What does that mean?
When the model is created, compiled, trained and deployed the input layers is the same for any test/val/train. You could have images of different size but still you have to crop or expand in order to use the model.
I think in the context you mention here the 16x16x3 image is used first to find regions of interest and the inputted into the CNN model.
Hi @Carrie_Young,
If we have built and trained model A that has an input layer which accepts a 14 x 14 x 3 image, then we can only pass a 14 x 14 x 3 image to that model.
However, we can build model B which is identical to model A except that it has a different input layer which accepts 16 x 16 x 3. Then we can copy all learnt parameters from model A to model B. Now you can pass a 16 x 16 x 3 image to model B (not to model A).
Now, model A & B have the same set of filters. Consider the first Conv layer ( 16 filters of size 5 x 5 ), since model B takes a larger input (16 x 16 x 3) than model A (14 x 14 x 3) , model B produces a larger output (12 x 12 x 16) than model A ( 10 x 10 x 16 ) as well. Such difference propagates through the rest of the layers, and so we see those sizes in the side that you have shared.
Cheers,
Raymond
Thanks for your answer. Yes, I’m confused about 16 x 16 x 3 images and 14 x 14 x 3 images. I wonder if a CNN model can work like a sliding window concept, automatically detecting inputs of the same size and concatenating the results at the end of the first layer before transferring the new grid to the next layer.
Thanks for your answering. So if we want to use the same model to predict images with different shape, we must create a new model with learnt parameters right?
In principle you could crop the image and use the different sections as input images for the CNN, and those images will be individual images the model will be trained on or tested on, i.e. that cropped image will go from first layer to the last layer as a single individual image. The input of the CNN is fixed.
Hi @Carrie_Young,
As @gent.spah said, you might crop the image to fit it to the model’s input shape. You get as many predictions as the number of images you cropped out.
You might duplicate the model with a new input layer as I described earlier. You also get multiple predictions and probably more efficiently.
You might resize the image to fit it to the model’s input shape. You get one prediction.
Therefore, you have choices.
Raymond