I am wandering how can we solve problem when there are images with different dimension how can I implement that?
With the type of network we are studying here, all the inputs need to be of the same dimensions and type of image (RGB or CMYK or greyscale or …). Fortunately image processing libraries always provide resizing functions. Typically you are “downsampling” images because it’s too expensive to process the full resolution. So you create a preprocessing step that resizes all your input images to the format that you have selected for your network. You could glue that onto the front of your network and call it a network that is more general and can handle different image dimensions as input, but why would you want to pay that price on every training run or test prediction? From a performance and cost standpoint, it makes more sense to implement the preparation of images as a separate step that you do once. Or once every time you add more images to your dataset. You would include the normalization and unrolling logic in that preprocessing step.
Hey @paulinpaloalto thanks a lot.
Hey @paulinpaloalto Thanks for the explaination. Can you share some details about the preprocessing? What would be the best way to do it for handling a huge dataset? I just completed the course and I wanted to try implementing in some other dataset. So I am facing the exact same issue mentioned. All images are of different sizes.
I’ve never personally built a preprocessing system like this, but I do have a lot experience with just general software engineering. I don’t think you need to be too worried about the cost and efficiency, since this is a “one time” cost that you do only when you add new images to your dataset. They give examples of how to resize images in the “Test with Your Own Image” sections of the exercises in Course 1 Week 2 (Logistic Regression) and Week 4 (Deep Neural Network Application). They use the “reshape” method from the PIL Image library. You can just write a python program that traverses your images and applies that operation to them. Then it’s just a question of how your images are stored in terms of how you access them. If they are individual files in a directory or tree of directories, just google “traverse directory python” and you’ll find plenty of sample code for how to do that. If they are in some kind of compound file like a database, then you need to do the analogous google search.
The other thing they do in these courses anytime they work with images is to flatten them and to “normalize” the values to be between 0 and 1. That is done because it makes Gradient Descent converge in a much more tractable way. You can see that logic also in the two exercises I referenced above. There are some options for how you do the flattening: here’s a thread that discusses that in some detail.
One other decision you need to make is how to store your images after processing so that you can conveniently access them. E.g. you could create your own h5 file as they use here. Or just create a parallel directory with the processed images since you now have the logic to read images from a directory (assuming that’s the starting point).
Since what you are doing is not part of the solution to any of the problems here, it’s ok to show actual source code if you want to discuss in more detail as you get going on this.