C4W2A2 Is transfer learning really working?

rguo · August 16, 2022, 8:50pm

In the alpaca_model (Exercise 2), we imported the MobileNetV2 model (without the top layer) and added new layers for our binary classification task. When we train the model for 5 epochs right after, the accuracy is decent but could be better. Here is a screenshot of my accuracies around 75%. However, I am very concerned about whether they are correct!

If I take 32 training examples and put them in the model, the model seems to almost always output a number between 1 and 1.5, regardless of whether the image is an alpaca or not. See screenshot below. I added those cells immediately after training.

If I am understanding this correctly, since we only have one output and use binary labeling, the output means alpaca if it is closer to 0, and means not alpaca if it is closer to 1? If that’s true, the model doesn’t seem to be able to recognize alpacas at all. I am not sure how the training and validation accuracies are calculated exactly, but they don’t seem to be reflecting what’s actually happening here. I found some discussions about this.

List item machine learning - ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data - Stack Overflow The accepted answer suggested that the issue is with BatchNorm and that we change momentum, but I don’t quite understand why it should help. After trying it myself, I don’t think it worked for me.

It also concerns me that the validation accuracy is usually higher than training accuracy. I don’t think that’s supposed to happen so often either?

PS. I already finished this course and got 100/100 on this assignment. I am asking this because I am trying to imitate the assignment and build a classifier with 3 classes (by changing to categorical label and categorical cross entropy) using my own training data. I noticed that when doing model.fit(…) myself, the accuracies are very high (>95%), but when tested on individual training examples, the model does not output the correct prediction.

I know there are many black boxes I don’t yet understand. I would really appreciate it if anyone is able to help!

paulinpaloalto · August 17, 2022, 12:17am

I haven’t had time to really go through your post in detail yet, but one high level point to note is that Prof Ng always uses from_logits = True mode for the loss functions here. That means that the actual output of the network is a logit value (the linear activation value) and not the output of sigmoid. To get a real prediction value you have to manually apply sigmoid (or softmax in the multiclass case).

rguo · August 17, 2022, 2:27am

Thank you so much for highlighting this! I was wondering why the output is just a dense layer with one unit without restricting its values.

If we put from_logits=True when compiling the model, the model will treat the output as being not normalized. Then when computing and minimizing the loss during training, is the model going to apply the sigmoid function to the output if our label is binary? Still, I am still worried because in our case, the model outputs are all larger than 1, so their sigmoid evaluations are all going to be larger than 0.5 (closer to 1), meaning that the predictions are all “not alpaca.”

BTW my lab ID is ulworizl in case it is useful. Perhaps there are mistakes in my code that the grader didn’t see?

Joel_Wigton · April 21, 2023, 3:46am

You’ll specify BinaryCrossentropy as your loss function, setting from_logits=True. That handles the binary classification for you (regarding the Sigmoid question).

On your second question, I wondered that too. TensorFlow has a Transfer Learning tutorial (it’s basically this exact assignment but with Cats/Dogs) and in that tutorial they say:

“Note: If you are wondering why the validation metrics are clearly better than the training metrics, the main factor is because layers like [tf.keras.layers.BatchNormalization] and [tf.keras.layers.Dropout] affect accuracy during training. They are turned off when calculating validation loss.”

Hope that helps,
Joel

Topic		Replies	Views
[Week 2] [Transfer learning] Is fine-tuning really working? Convolutional Neural Networks coursera-platform	11	1266	January 3, 2022
DLS Course 4 week 2: how is the model accuracy calculated for the alpaca model? Convolutional Neural Networks coursera-platform	1	563	September 15, 2022
Training vs Validation Accuracy and Loss Convolutional Neural Networks coursera-platform	5	681	June 2, 2022
Week 2, Assignment 2, Alpaca model: train vs validaton accuracy; linear vs sigmoid activationaccuracy Convolutional Neural Networks coursera-platform	6	598	July 11, 2021
C4 W2 A2 Residual Networks (Optional/Ungraded Exercise): Low Accuracy on user images Convolutional Neural Networks week-module-2 , coursera-platform	1	189	March 18, 2024

C4W2A2 Is transfer learning really working?

Related topics