C4 W3, While doing object detection, should all the images in train, valid and test dataset have the same shape?

Carrie_Young · April 10, 2023, 11:09am

When I was studying the Convolution implementation of sliding windows, Andrew Mentioned that the train dataset is 14*14*3 and the test dataset is 16*16*3. I wonder if the dataset must have the same shape when doing object detection. Why could data with different shapes be applied in the Convolution implementation of sliding windows? What does that mean?

gent.spah · April 10, 2023, 11:42am

When the model is created, compiled, trained and deployed the input layers is the same for any test/val/train. You could have images of different size but still you have to crop or expand in order to use the model.

I think in the context you mention here the 16x16x3 image is used first to find regions of interest and the inputted into the CNN model.

rmwkwok · April 10, 2023, 12:51pm

Hi @Carrie_Young,

If we have built and trained model A that has an input layer which accepts a 14 x 14 x 3 image, then we can only pass a 14 x 14 x 3 image to that model.

However, we can build model B which is identical to model A except that it has a different input layer which accepts 16 x 16 x 3. Then we can copy all learnt parameters from model A to model B. Now you can pass a 16 x 16 x 3 image to model B (not to model A).

Now, model A & B have the same set of filters. Consider the first Conv layer ( 16 filters of size 5 x 5 ), since model B takes a larger input (16 x 16 x 3) than model A (14 x 14 x 3) , model B produces a larger output (12 x 12 x 16) than model A ( 10 x 10 x 16 ) as well. Such difference propagates through the rest of the layers, and so we see those sizes in the side that you have shared.

Cheers,
Raymond

Carrie_Young · April 10, 2023, 6:06pm

Thanks for your answer. Yes, I’m confused about 16 x 16 x 3 images and 14 x 14 x 3 images. I wonder if a CNN model can work like a sliding window concept, automatically detecting inputs of the same size and concatenating the results at the end of the first layer before transferring the new grid to the next layer.

Carrie_Young · April 10, 2023, 6:15pm

Thanks for your answering. So if we want to use the same model to predict images with different shape, we must create a new model with learnt parameters right?

gent.spah · April 10, 2023, 7:19pm

In principle you could crop the image and use the different sections as input images for the CNN, and those images will be individual images the model will be trained on or tested on, i.e. that cropped image will go from first layer to the last layer as a single individual image. The input of the CNN is fixed.

rmwkwok · April 10, 2023, 8:57pm

Hi @Carrie_Young,

As @gent.spah said, you might crop the image to fit it to the model’s input shape. You get as many predictions as the number of images you cropped out.

You might duplicate the model with a new input layer as I described earlier. You also get multiple predictions and probably more efficiently.

You might resize the image to fit it to the model’s input shape. You get one prediction.

Therefore, you have choices.

Raymond

Topic		Replies	Views
Lecture Question - Convolutional Implementation of Sliding Windows Convolutional Neural Networks coursera-platform	3	545	June 7, 2022
The image size in training set for sliding windows detection Convolutional Neural Networks coursera-platform	2	516	January 28, 2023
Convolutional implementation of sliding window Convolutional Neural Networks coursera-platform	3	775	October 1, 2021
Convolution implementation of sliding windows - what is the tested image Convolutional Neural Networks coursera-platform	1	503	October 25, 2021
Sliding windows detection Convolutional Neural Networks coursera-platform	5	561	January 8, 2024

C4 W3, While doing object detection, should all the images in train, valid and test dataset have the same shape?

Related topics