I have a general question regarding GradCAM vs Image Segmentation. It seems like GradCAM is able to locate the disease area based on the CNN model used in learning multiclass disease labels. It doesn’t require retraining a new model. Image Segmentation on the other hand do need labelled data on pixel ( or voxel ) label across all disease classes. Then wouldn’t GradCAM be enough already to do the job of highlighting the disease? Why would people need Image Segmentation here? I understand Image Segmentation is useful for car auto-driving objection detection because it require precise boundary of detected object for safe driving. But here in medical images, it seems like GradCAM can do the job already.
Another question for Image Segmentation specifically. Based on my understanding, to train/evaluate the model, train / dev / test datasets need to have labels for each pixel ( or voxel). Is this even humanly possible to manually label all pixels (or voxels) for thousands of thousands of image samples? Especially for high resolution X-rays/CT-scan when the image size is quite huge already.
In addition, image processing often include a preprocessing step to normalize/scale the picture, including its size. During training, for an input image which has pixel (or voxel) level labels before feeding into model to learn, how would you handle these labels after the picture is rescales? For example, originally you have 1024 * 1024 labels for each pixel, what the labels would be for a rescaled image of 300 * 300 during training?