Loss function and Accuracy

I have recently revisited the classes in the DLS logistic regression and I have some theoretical about loss function and accuracy after having some real life practices.

In theory, as my NN lowers the cost function by optimizing the weights and biases, the cost function should be smaller as it becomes more and more accurate (as in the blue line of the training set)

But for the validation set, the accuracy rises along with the loss, I have seen it in cases when the validation set only has a small pool of data, and also when there’s some data mismatch between the train and dev-test set.

My guess is that by minimizing the loss of the training set and moving towards the local minimum, because of inherent differences between the compositions between the train-dev set, it shifts away from the local minimum of the dev set, resulting in higher loss but it fits the data as it becomes a better and more complex classifier.

I’m not sure if my understanding is correct, please correct me if I’m wrong. I just want to have a better understanding about the fundamentals.

Yuhan Chiang

It appears to me that your NN is overfitting the training set. That’s why the performance on the training set is better than the validation set.

1 Like

The domain covered by the training set is probably smaller than the domain covered by the validation set. Your training set needs to be richer in diversity of cases.

1 Like

I have made quite an improvement when the training domain is quite different from another domain, in the figure above. I am really new into this field and please correct me if my wrong in my interpretations.

The problem is:
I have unlimited data from Source 1 and Source 2, but limited data from S3.
The target domain might consist of more data from S3, but it is not ensured.
I want to optimize the false-negative cases (which are S2, S3 data classified as S1)
I am tackling with a data mismatch problem that is present in my database.

The training domain consists of 50% data of S1, 45% data from S2, and 5% data from S3
The dev-test set consists of 50% data of S1, 25% data from S2 and S3 .

I have eyeballed the test set error cases and found that the false negative cases comes mostly from S3

I am suspecting that my network didn’t learn the difference between S1 and S3 well because in the training set I am only limited to a small source of S3.

Therefore I think it is reasonable that slightly overfits the training set, but as long as the classifier does better accuracy in the dev data domain it’s alright.

Thank you for reading the whole message and helping me along my journey.


Wow you actually pointed out the whole problem that I’m going through, that’s really impressive!
My training set needs to be richer, but I guess this is the problem when facing data mismatch problems with limited resources. The reason that I want to improve this without getting more data is for academic research, although it seems that I’m spinning the wheel backwards. If you have any more ideas, please tell me and let me see if I can improve the model.

Thank you for helping!


@TMosh @MadhavPhadke

I guess my main question narrows down to :

If my dev-test set is different from my training set, is it okay for my model to overfit a little bit as long as it produces better accuracy in my target domain?

Best regards,

Thank you for your compliment.

If you can tell me more about the problem, I may be able to offer some suggestion.

What are the three sets of data? What are their sizes? What is the application area?

1 Like

Thank you for helping!

I am working on a structural damage identification data which uses 1D vibrational data, which I have converted to a frequency spectrum to detect healthy and unhealthy states.

So basically it’s a binary classification problem.

But because of the various kinds of healthy state of the structure and various kinds of damage, it might be suitable for DL strategies because the DL can learn deep underlying features and is good at mapping out X to Y.

I have access to unlimited healthy state data (S1), computer generated damage simulations (S2) but limited data from (S3).

The source from S2 is made to compliment the data from S3 because getting a variety of data from S3 is difficult.

I currently have 50K examples from S1, which represent the healthy state of the real structure.
I have 37K examples of S2, which represents damaged state of the virtual structure.
I have 13K examples of S3, which represents damaged state of the virtual structure.

The training set consists of 40K from S1, 8K from S2 and 32K from S3.
The dev-test set consists of 10K from S1, 5K from S2 and 5K from S3.

Because I want the NN to fit better in real life structures, I have to make the dev-test set have a more percentage of S2 (which is 25% of the damage). But because of some reasons, S2 and S3 looks very different and is the main problem of my data mismatch.

Because I have only limited data of S2, this makes the distribution of train and dev set different.

My strategy is to optimize the classification task of S1, so that it could generalize better to a different distribution of data in a binary classification task.

I know that getting more resources from S3 would be the best solution. Because of limited S3 data my classifier would mix S3 with S1 more because they “look more alike”.

And this is where my hands are tied, if I optimize the performance on train set, it would seem like it overfits the dev-test set, but it somehow creates what I want, which is a better classifier of S1 data from all other sources.

Thank you again for reading my problem.

Yuhan Chiang

Hello Chiang Yuhan,

Thanks for your email. It is important to have adequate data on unhealthy cases along with healthy cases. Otherwise, discrimination would not be as good by the NN/DL method. In your final testing you are probably finding that too many of the unhealthy cases get classified as healthy. There are other statistical methods that would perform better.

You had mentioned earlier that is an academic activity. Can you elaborate on it? Which university? Where is it located? Is this thesis work or faculty research? Who are potential users of the work? Thanks.

Best regards,


1 Like

Actually, there are significantly more healthy cases that are classified as damage.

The reason behind it is that I optimized my model to classify healthy features, so anything outside of the group of features would be classified as damaged. It is like sort of reverse engineering the problem of damage detection by simplifying it into a decentralized binary classification task.

I totally agree with this statement. As time goes on I’'m sure that my community would gather more data for me to train in the future.

I’m doing my master thesis on the application of DL in the field of structural engineering. I’m leveraging the power of NNs (such as automatic feature selection, high adaptability to cases) to create a system that can automatically detect damage in a structure.
As of my task is hard for human to carry out, I don’t have an optimal error rate to shoot for. But for my implementation I think about 90% accuracy would work really well. And the most important metric would be false negatives (damaged data classified as healthy).

Recently I’m working on feature engineering methods so that my digital twin could generate more data that looks more like the real-world data by using mathematical methods, so that my research could have more relevancy.

I am a master student in National Taiwan University, Department of Ocean Engineering and Engineering Science.

Thank you for giving me some pointers. It would benefit my work greatly.

Yuhan Chiang