I am trying out a POC with a data set that includes a collection of medical Computed Tomography (CT) scan images of 251 patients, of which 151 patients are diagnosed with Gallbladder Cancer. The remaining 100 patients have a normal healthy gallbladder. Each patient file contains CT images in both Dicom and Nrrd file formats. The Dicom files are in several folders for different views/studies, including plain study, contrast study, thin-cut contrast, and lung view. There are three Nrrd files for each patient: vol.nrrd, seg.nrrd and label.nrrd. Vol.nrrd contains the thin-cut contrast images in nrrd format whereas seg.nrrd and label.nrrd contains the ground truth segmentation and labels information. Maximum 7 labels associated with the images depending upon the tumor subtype present.
Problem statement: This dataset can be used to train classification models for differentiating malignant and normal Gallbladder and grading Gallbladder cancer into its subtypes.
Need Help: How to create synthetic data from these few data sets? Is there any tool available?
Any algorithm to start the model.