Can Image Segmentation Models overfit?

I am working on medical imaging segmentation models. One question that I need to ask is that does image segmentation models overfit? Since it is going from image to image translation rather than like in classification models, I feel like it will not overfit

Thank you

I have never trained an Image Segmentation Model other than taking DLS Course 4, which includes one assignment about U-Net for Image Segmentation. Image Segmentation is a form of classification: it just happens at the pixel level. So there still is a notion of prediction accuracy, which is what overfitting is based on. The definition of overfitting is that the model has noticably higher accuracy on the training set than it does on the test or cross validation set. Just on general principles (but without actual experience as I mentioned above), I would expect this is still possible with Image Segmentation.

1 Like

@malitha96 personally I think there is the risk of falling into an ‘illusionary trap’ with regards to thinking about segmentation models as they were presented in the course-- In our case, even in the training data, in contrast to say YOLO models (where we think more in terms of areas or bounding boxes) every single area is fully accounted for, so one might be first be tempted to think, as long as the output on my test cases is also filling the entire area, how, in a sense, can we then be ‘overfitting’ ?

One has to recall though the real meaning of this term is that our model does not generalize well to new cases-- It very well may still be mapping the entire area on test cases, but perhaps what it is mapping where on these novel cases is not at all what we are looking for.

So yes, overfitting still a major potential concern. Maybe it got ‘the sky’-- maybe it got ‘the road’-- But maybe it also mixed the two labels up.


@paulinpaloalto @Nevermnd Thanks for your responses. Let’s consider a scenario where we are performing medical image segmentation.

In the context of image classification, it’s common practice to split the dataset into training and validation sets based on the patient level to ensure that images from the same patient don’t appear in both sets, thus preventing data leakage and overfitting.

However, I’m unsure whether the same principle applies to image segmentation tasks. Should we also split the dataset into training and validation sets based on the patient level for medical image segmentation?

Mmm, at least to start with can you clarify what you mean by ‘patient level’ ? Are we talking the severity of the condition ? The type of condition ? The angle/axis the image is taken from, or, what, exactly ?

1 Like

When I refer to “patient level,” I am focusing on ensuring that all data (images or scans) from a single patient are either in the training set or the validation set, but not split between the two. This prevents the model from seeing data from the same patient during both training and validation, which could lead to overfitting and an unrealistic estimation of the model’s performance.


You have a dataset of MRI scans for 100 patients, and each patient has multiple MRI scans of their heart.

Without Patient Level Splitting:

  • Training Set: Contains 70% of the total scans, with images from all 100 patients.
  • Validation Set: Contains 30% of the total scans, also with images from all 100 patients.

In this setup, the model might see scans from the same patient in both the training and validation sets. This can lead to overfitting in classification as the model might learn specific characteristics of individual patients rather than generalizable features.

With Patient Level Splitting:

  • Training Set: Contains MRI scans from 70 patients (e.g., patients 1 to 70).
  • Validation Set: Contains MRI scans from the remaining 30 patients (e.g., patients 71 to 100).

Here, the model trains on scans from one set of patients and is validated on a completely separate set of patients. This ensures that the model’s performance on the validation set reflects its ability to generalize to new patients, providing a more accurate measure of its true performance.

We do it usually in classification and detection. Not sure whether we should do it in segementation because its an image to image translation task

Honestly I have to defer here to @paulinpaloalto a bit here in case he knows more than me (though you seem to have an unusually very descriptive definition in mind…).

I’m not sure-- It may be a ‘Ferrari’ in the train set, and a ‘Ferrari’ in the validation set-- but is it my Ferrari ?

I think thus far we haven’t climbed quite up to that level (except in facial recognition tasks, which are presented separate from classification).

I mean I am tempted to think I would not include the exact same images in my test and validation set, or say the dorsal view of patient 1’s malignancy in both sets, but the dorsal in the train, and then a ventral view in the validation ?

Obviously it sounds like we are not searching for a specific patient’s cancer, but I’m not sure.

Have to ask though because I imagine even variations in the angle can be significant in a tiny training set.

But I made the point earlier that Image Segmentation is a classification task. It’s just that the classification is happening at the individual pixel level. So why is it that you think we should treat Image Segmentation differently w.r.t. the partitioning of the datasets?

Or to phrase it differently: what would be the harm if we used the same “patient level awareness” in splitting the data between training and validation? Why would it be worse to do things the same way we normally would for a classification task?