I made a deep learning model and after 100 epochs the graph of validation accuracy is having too many oscillations with every epoch. How do we normalize the validation accuracy. Although the validation loss is satisfactory.
hi @usmanlko
with we as a third person who don’t have any idea what kind of deep learning model you would have created? with the information you provided, it is like needle in a haystack.
So when you seek suggestion for your created model, always provide a minimal information on what kind of deep learning model you created, with information about data, datatype, datasplit done. what was accuracy and loss at the end of 100th epoch.
also as you stated your validation accuracy was oscillating, one also needs to know amount of oscillating curve your model was creating on training.
having just satisfactory loss doesn’t signify a successful model outcome as you already must have known.
it is combination of having a gradually gradient loss and an accuracy as close to 99% based on features/classes in the data being used. Is it predictive model? or descriptive model?
The loss value by itself is almost meaningless. Loss values are just numbers and all you can say is that a lower loss value is better than a higher one and even that is not always guaranteed because accuracy is quantized. If I tell you that the average loss value of my model is 1.5, what can you conclude from that? Loss values are not comparable between different models.
Accuracy on the other hand is a meaningful value and is all you really care about in terms of judging the performance of your model.
The evidence you present suggests that you have a convergence problem. There could be lots of reasons for that, but as Deepti says there’s not really a lot more we can say with the limited information we have at this point. Have you taken the equivalent of DLS Course 2 here from DLAI? That course covers how to tune hyperparameters and different optimization algorithms like Momentum, Adam, RMS and RMSprop as well as dynamic learning rate management. Are you familiar with any of those ideas? Have you tried any of them?
I am developing a unet based model that uses self attention block as encoder that classifies cancer images of 8 classes. I am using image size of 256X256 and after 100 epoch my macro accuracy is 92%. How can i do the same so that my validation accuracy will be in a linear curve.
Thank you
Yes sir, I changed these hyperparameters and currently using Adam but the problem remains same. The description of my work has been described in my last reply. But I wonder why i am having non linearized curve.
can you brief on your model? like what layer you used, kernel unit.
as you have 8 classes, so I hope you’re using categorical crossentropy loss but again if your multi-class model output predictions are integer values then you need to use sparse categorical crossentroly.
you also haven’t mentioned what is your training accuracy value?
Also can I know why are expecting a linear validation accuracy?
Do you know having a non-linear curve is actually stating the model’s ability to learn the multi-class feature complexity of your data which is good. Having being said that there’s surely you can look to improve the validity accuracy from 92%
but provide information on data split? if this is a disease multiclass model then I suppose using categorical crossentropy is not correct but I cannot confirm as I still don’t know what data you are working on?
next did you add any learning rate to your Adam optimiser?
Based on the graph provided(as still some of the crucial information is not provided), the difference in validation and training accuracy curves point, your model is not completely representing your data based on the features of data you used for validation data.
provide information on validation data and training data
Add learning rate.
A multiclass classification data comes with more complexity making the model training more oscillating, so that shouldn’t be your problem if your validation is truly representing your data. how does one confirm that by normalisation scaling between training and validation data.
I am creating unet like structure in which i am replacing the encoder with attention block. In my encoder block i have used attention block that has con2d layer followed by batch layer and relu is used for activation and the last three layers i have used are ChannelAttention and spatial attention and maxpooling2d(2,2) and i have used filter size of 64,128,256 and 512. My training accuracy is 98%. Categorical cross entropy is used for multiclass classification. My learning rate is 0.001.
One more thing i would like to add that my data is slightly imbalanced. There are approximately 10000 images in which some classes have only 900 images too. Is this may be the another reason.
do you mean only 900 images out of the 10000 have some classes?
it will again depend on how the ratio of distribution is present in your data.
for say, if you are stating that these some classes is might be one or two and distributed in 900 images only, then my next question would have you distributed your databases true values and label values.
can I know is the multiclass one hot encoded? or the pred values again classify into different classes?
Training accuracy being 98% and validation accuracy being 92% would mean there is a variance issue.
See the pinned comment which explains bias and variance issue
Also as you stated some classes being only in 900 images is ofcourse pointing on imbalanced dataset, but again depends on your data split between training and validation.
channel and spatial attention mean is your classification multidimensional?? then categorical_crossentropy wouldn’t fit in.
Also these attention layers where you used relu activation, does these last layers have dense layers and if yes then what is the unit for the last dense layer?
Also I wanted to know as you are working on a multiclass model, why didn’t you opt for softmax activation function which would have been a better choice.
Regards
DP
In addition to Deepti’s detailed comments, I would say that your training curves don’t look that bad. It’s not unusual to see some oscillation in the validation accuracy numbers, but it does look like you have at least a bit of an overfitting problem. Have you tried applying any regularization techniques?
Given that you are using U-Net, I’m curious whether you are doing image segmentation or just overall classification? It obviously makes a big difference and in the image segmentation case it may require more sophisticated cost functions so that the “background” pixels don’t swamp your accuracy values.
I just want to utilize the structure in the image classification problem to find out whether the cancer is benign or malignant at what are its classes.
Thank you for your reply. So how can i fix the variance issue as there is difference in training accuracy and validation accuracy. Please suggest me what should i need to add in my model. Actually my data consist of cancerous images and I am taking the dimensions 256x256x3. Being for multiclass classification i have used categorical cross entropy. In the last dense layer i have used 8 units. I have used softmax activation in my last dense layer.
Regards
Usman
i am confused here if you want model to detect cancer being malignant or benign then what are 8 classes??
this is not adding up, coz as you know cancer staging is based on TNM(Tumor, node, metastasis)
n if your main model wants to detect images are benign or malignant then that becomes a binary classification.
Sorry am I missing any crucial information!!!
I have distributed the data using thumb rule as 80% is utilized in training and 10% is used in validation and 10% is used in testing. Kindly suggest me whether or not should i go for more epochs. As in the graph the best epoch is 99 so i was little bit curious to go for more epochs.
Thanks for your reply. There are 5 subclasses under benign and 3 subclasses under malignant. I am just dividing the data basically on 8 classes. So i am not applying binary classification.
are you working on something like this
No, it is not like this. I am working on a single cancer. My dataset contains basically two classes benign and malignant. Under benign i have 5 classes and under malignant i have 3 classes. Instead of classifying in benign and malignant i am going to classify it among 8 total classes.
Go through the below file
s41598-017-04075-z.pdf (2.9 MB)
See the specific points related to loss function, data augmentation mentioned. that’s where you need to work upon.
Also if your data requires to scale the images, do try that and train your model again.
Yes the basic thumb rules is right but you can also try
70:15:15
or
60:20:20
see how your model trains but can I know how your label images are tweaked
i have given label with respect to their folder name.
@Deepti_Prasad Thank you for reply
My data is augmented using the CLHAE algorithm. The results shown are generated while applying the model on augmented data