Can Multiple Instance Learning (MIL) be used for regression instead of classification?

I’m currently working on a histopathology project where I use DINOv2 (self-supervised ViT) as a feature extractor on image tiles. After extracting tile-level features, I aggregate them at the slide level using a Multiple Instance Learning (MIL) framework.

Most of the papers and implementations I’ve encountered primarily apply MIL to classification tasks (e.g. predicting whether a slide contains cancer). However, my goal is slightly different. I want to estimate the fraction of the tissue in the image that is cancerous, which makes the problem more naturally framed as a regression task rather than classification.

My question is: Is MIL commonly used for regression problems, or is it mainly limited to classification? If regression with MIL is feasible, are there specific architectures or papers that implement this approach (e.g., attention-based MIL with a regression head)?

I’m relatively new to MIL-based pipelines, so I may be misunderstanding some of the assumptions behind the framework. Any pointers/suggestions/advise or references would be very helpful.
Thanks in advance!

MIL handles regression, classification, cluster sampling, stratified and every aspect of data statistically you can first properly label/prepare as per your creative model architecture.

Based on case explanation, looks like you might be using MRI images? with multiple slices?

Thank you for your response.

My setup involves whole slide histopathology images (WSIs) only. I divide each WSI into 256×256 tiles and use DINOv2 as a feature extractor to obtain tile-level embeddings.

Each slide has a continuous label (representing tumor fraction), and I am using a Multiple Instance Learning (MIL) framework to aggregate tile-level features into a slide-level prediction.

who is doing the data annotation of your images?

or do you have already labelled images

Thank you for the question.

The slide-level labels are already available as continuous scores, so I’m not performing any manual annotation at the tile level. The setup is weakly supervised, with supervision only at the slide level.

For evaluation, I’m primarily looking at correlation metrics (Pearson and Spearman) to determine how well the model captures the underlying signal.

1 Like

looks like you are in the right direction, all the best.

The only thing is to keep in mind in such projects is the amount of dataset in hand, how versatile your data is, per se if the slides has all type cancer related tissue images, so it’s able to to get the right output feature.

Most of the healthcare projects, fall back in these challenges, as it dataset doesn’t have all the cancer types to be trained, validated or tested, and some cases even if one has, another challenge comes with testing on real-time unknown data. One need to plan all of these especially when planning to do healthcare machine learning projects.

Good luck!!!

Thank you for your response. I had a follow-up question regarding patch preprocessing.

My patches are extracted at 256×256 resolution and saved as PNGs. However, most standard CNN architectures (e.g., ResNet50, VGG19) and ViT-based models (e.g., DINOv2) typically expect 224×224 inputs.

In this case, would resizing from 256×256 to 224×224 be the appropriate approach, or would it be preferable to use center/random cropping? Could you please clarify what occurs at this stage? Cropping would mean information loss; is that acceptable?

Are there recommended best practices for handling such resolution mismatches in WSI pipelines?

did you try to train with same patch dimensions variability with the base model.

As far as I feel 256 to 224 is not too much variability to explore in model training you can try both approach and compare the results.

Although why I wouldn’t try to reduce the patch from 256 to 224, for reason relative loosing resolution detection of tissue morphology especially with cancer cases.

why don’t you try to have a base model having 256 x 256, so you could match with your dataset dimensions? thought about that??

Is it possible to change the input of ImageNet from 224x224 (default) to 256x256?
I would have to depend on LLM for code generation then, since I do not know how to do it.
Should I keep the stride at 128 while extracting the 256x256 patches?
That would mean images overlap, but the loss of information can also be avoided.
Is this a viable strategy?

i cannot confirm this as I don’t have any idea about your dataset.

To change the. default dimension of base model, you will have to separately create a model with base model dataset and dimensions of 256 x 256.

@sd007 are you doing project steps by follow LLM instructions! :roll_eyes::smirking_face:

1 Like