C4W2A2 Is transfer learning really working?

In the alpaca_model (Exercise 2), we imported the MobileNetV2 model (without the top layer) and added new layers for our binary classification task. When we train the model for 5 epochs right after, the accuracy is decent but could be better. Here is a screenshot of my accuracies around 75%. However, I am very concerned about whether they are correct!

If I take 32 training examples and put them in the model, the model seems to almost always output a number between 1 and 1.5, regardless of whether the image is an alpaca or not. See screenshot below. I added those cells immediately after training.

If I am understanding this correctly, since we only have one output and use binary labeling, the output means alpaca if it is closer to 0, and means not alpaca if it is closer to 1? If that’s true, the model doesn’t seem to be able to recognize alpacas at all. I am not sure how the training and validation accuracies are calculated exactly, but they don’t seem to be reflecting what’s actually happening here. I found some discussions about this.

It also concerns me that the validation accuracy is usually higher than training accuracy. I don’t think that’s supposed to happen so often either?

PS. I already finished this course and got 100/100 on this assignment. I am asking this because I am trying to imitate the assignment and build a classifier with 3 classes (by changing to categorical label and categorical cross entropy) using my own training data. I noticed that when doing model.fit(…) myself, the accuracies are very high (>95%), but when tested on individual training examples, the model does not output the correct prediction.

I know there are many black boxes I don’t yet understand. I would really appreciate it if anyone is able to help!

I haven’t had time to really go through your post in detail yet, but one high level point to note is that Prof Ng always uses from_logits = True mode for the loss functions here. That means that the actual output of the network is a logit value (the linear activation value) and not the output of sigmoid. To get a real prediction value you have to manually apply sigmoid (or softmax in the multiclass case).

Thank you so much for highlighting this! I was wondering why the output is just a dense layer with one unit without restricting its values.

If we put from_logits=True when compiling the model, the model will treat the output as being not normalized. Then when computing and minimizing the loss during training, is the model going to apply the sigmoid function to the output if our label is binary? Still, I am still worried because in our case, the model outputs are all larger than 1, so their sigmoid evaluations are all going to be larger than 0.5 (closer to 1), meaning that the predictions are all “not alpaca.”

BTW my lab ID is ulworizl in case it is useful. Perhaps there are mistakes in my code that the grader didn’t see?

You’ll specify BinaryCrossentropy as your loss function, setting from_logits=True. That handles the binary classification for you (regarding the Sigmoid question).

On your second question, I wondered that too. TensorFlow has a Transfer Learning tutorial (it’s basically this exact assignment but with Cats/Dogs) and in that tutorial they say:

Note: If you are wondering why the validation metrics are clearly better than the training metrics, the main factor is because layers like [tf.keras.layers.BatchNormalization] and [tf.keras.layers.Dropout] affect accuracy during training. They are turned off when calculating validation loss.”

Hope that helps,
Joel