I make use of the week4 assignment workbook trying to train a network with my own data set. Where the data set only consists of single input and single output, and the output is strictly equals to the input (i.e. y = x).
I used 2 layer neural network, after 2500 iterations, the cost is still pretty high (0.5695217223160349). But when it comes to prediction using train_set and test_set, the accuracy is 100% and 99.9999999%.
May I ask why is it the case? I think cost measure how much difference are there between the trained output and expected output, so why would I have a cost of 0.5695 but the training accuracy of 100% ? So how shall we read the cost of a model? Originally I think that the cost must be very small in order for a model to be useful. But from the accuracy above, that doesn’t seem to be the case.
Actually, accuracy is a metric that can be applied to classification tasks only. It describes just what percentage of your test data are classified correctly. For example, (following the theme) you have binary classification cat or non-cats. If out of 100 test samples 95 is classified correctly (i.e. correctly determined if there’s cat on the picture or not), then your accuracy is 95%.
Loss depends on how you predict classes for your classification problem. For example, your model use probabilities to predict binary class cat or non-cats between 1 and 0. So if probability of cat is 0.6, then the probability of non-cat is 0.4. In this case, picture is classified as cat. Loss will be sum of the difference between predicted probability of the real class of the test picture and 1.
The 2 metrics are related to each other however not linearly. It is much more complicated that that.
Although it happens so that:-
if we have a low accuracy and huge loss we have made heavy errors on our data.
a low accuracy but low loss means you made little errors on a lot of data
a great accuracy with low loss means you made low errors on a few data (best case)
your situation: a great accuracy but a huge loss, means you made huge errors on a few data.
Hopefully, it helps
Thanks for your reply. I think I got why the cost is high but the accuracy could be still correct now.
But I tried to modify my data set with an additional column, which is purely a noise, the output is still just the same as the first column. Then I found the accuracy become just 50%, which means the redundant additional column is causing a lot of distortion to the model.
Does that mean neural network is not “smart” enough to filter out noise from input data?
Well, if you add noise, then perhaps you need to run more iterations in order to give the algorithm time to figure out how to filter out the noise. You’ve fundamentally changed the problem, so there is no guarantee that you’ll get the same results with the exact same set of hyperparameters you used before with a different problem. Note that hyperparameters include the number of layers and neurons per layer as well. If there were such a thing as one magic network that worked for all problems, we wouldn’t be here, right?
Thanks a lot. I think I over-simplified the problem and over-estimated the power of ML. I will learn more from the specialization and try to come back to my own question later