Week 2, Assignment 2, Alpaca model: train vs validaton accuracy; linear vs sigmoid activationaccuracy

lo-hfk · July 9, 2021, 11:01am

Hello. I had two questions while completing this assignment:

Why is the validation accuracy consistently higher than the training accuracy? I believe we did not discuss this case much in the lectures. Are we using too much regularization? Why would this be desirable?
The activation of the output layer is linear. Why is that? I would have expected sigmoid. It seems to be in line with the fact that we are supposed to choose the binary cross entropy loss function with ‘from_logits=True’, but at the same time it looks to me like the lables of the training data are probabilities?!

TMosh · July 10, 2021, 4:19am

Given the small size of the datasets used in the exercise, and that we’re not training for very long, the results are pretty good. You can’t look too closely if the data set doesn’t have good statistics.

If you have only a single output, you can use a linear output for predicting classifications just fine. Consider that if you used a sigmoid() output, it’s a monotonic function, and it won’t change where the relative threshold is between false and true. You just have to use the correct threshold. If you had a sigmoid output, the threshold would be >= 0.5. That’s the same as using a linear output with a threshold of >= 0.

lo-hfk · July 10, 2021, 10:56am

Hey, thanks a lot for your reply! Regarding linear activation it makes perfect sense. I am not sure it quite answers my first question, though!?

I was not bothered about the absolute levels of accuracy. My confusion comes from the fact that apparently the model performs systematically better on the validation set than on the training set. In my mind, that’s not what should happen!? Are you saying it is just a statistical fluke given the test set is small (and might accidentally have a lot of “easy” to classify images)? Or does it perhaps come from using dropout, which limits training set performance? Or something else?

In short: what are the possible reasons why a model might perform better on the validation set than on the training set? (I feel like this is a question that should have been discussed in course #3 but wasnt.)

TMosh · July 10, 2021, 3:45pm

Since you don’t know how the training and validation sets were selected, you can’t really draw any useful conclusions about small differences in performance.

TMosh · July 10, 2021, 3:46pm

But in general:
If the training, validation, and test sets don’t give similar performance, the reasons could be:

poor statistics due to not enough data.
badly randomized data sets
phase of the moon or the position of the planets. It’s a statistical process, sometimes weird stuff happens. Roll the dice, re-arrange the subsets, and try again.

paulinpaloalto · July 11, 2021, 3:55am

Note that when you use the Binary Cross Entropy Loss Function with from_logits = True, that means that it is applying sigmoid internally. It is done that way because it is a) more efficient (one less call) and b) more numerically stable (e.g. they can handle the case of sigmoid saturation more easily). This is all covered in the documentation for BCE Loss.

lo-hfk · July 11, 2021, 11:46am

Thanks @TMosh and @paulinpaloalto , those were the answers I was looking for!

Topic		Replies	Views
[Week 2] Assignment 2, Exercise 2 : Why should we choose 'linear' output instead of sigmoid output if it's binary classification problem and not linear regression? Convolutional Neural Networks coursera-platform	1	759	April 19, 2021
Alcapa_model: why not sigmoid at prediction? Convolutional Neural Networks coursera-platform	5	635	April 25, 2021
Training vs Validation Accuracy and Loss Convolutional Neural Networks coursera-platform	5	679	June 2, 2022
Exercise 2 - alpaca_model (linear) Convolutional Neural Networks coursera-platform	2	599	August 16, 2023
Transfer_learning_with_MobileNet: Convolutional Neural Networks coursera-platform	1	498	November 22, 2021

Week 2, Assignment 2, Alpaca model: train vs validaton accuracy; linear vs sigmoid activationaccuracy

Related topics