VGG Architecture that Passed Assignment doesn't learn Cats vs. Dogs

My lab ID is ilkmrowr.

After successfully completing the assignment, I tried the last cell, which trains the VGG architecture to see how well it works. Although I did not encounter timeout errors, the net did not learn the dataset - its accuracy was ~ 50%.

So, I created another notebook (C1W4_Testing), where I experimented and found smaller networks performed much better. I suspect the issue was one of overtraining, but don’t know how to verify that - i.e. how can I get metrics to output training as well as validation set accuracy? I can’t find that in the docs.

I also experimented with a sequential architecture, which I thought was equivalent to my final functional style architecture. Oddly, the sequential architecture performed much better than the functional style architecture, though drastically more slowly. Does this make sense?

If I want to post some of my code and questions to StackOverflow, would that violate any of Coursera’s terms – as long as I don’t explicitly identify the code as Coursera related?

i.e.

# For reference only. Please do not uncomment in Coursera Labs because it might cause the grader to time out.
# You can upload your notebook to Colab instead if you want to try the code below.

# Download the dataset
dataset = tfds.load('cats_vs_dogs', split=tfds.Split.TRAIN, data_dir='data/')

# Initialize VGG with the number of classes 
vgg = MyVGG(num_classes=2)

# Compile with losses and metrics
vgg.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Define preprocessing function
def preprocess(features):
   # Resize and normalize
   image = tf.image.resize(features['image'], (224, 224))
   return tf.cast(image, tf.float32) / 255., features['label']

# Apply transformations to dataset
dataset = dataset.map(preprocess).batch(32)

# Train the custom VGG model
vgg.fit(dataset, epochs=10)

Epoch 1/10
727/727 [==============================] - 446s 614ms/step - loss: 0.6932 - accuracy: 0.4975
Epoch 2/10
727/727 [==============================] - 445s 612ms/step - loss: 0.6932 - accuracy: 0.4996
Epoch 3/10
727/727 [==============================] - 444s 611ms/step - loss: 0.6932 - accuracy: 0.4994
Epoch 4/10
727/727 [==============================] - 442s 607ms/step - loss: 0.6932 - accuracy: 0.5003
Epoch 5/10
727/727 [==============================] - 442s 608ms/step - loss: 0.6932 - accuracy: 0.4999
Epoch 6/10
727/727 [==============================] - 441s 607ms/step - loss: 0.6932 - accuracy: 0.4998
Epoch 7/10
727/727 [==============================] - 442s 608ms/step - loss: 0.6932 - accuracy: 0.5000
Epoch 8/10
727/727 [==============================] - 443s 609ms/step - loss: 0.6932 - accuracy: 0.5000
Epoch 9/10
727/727 [==============================] - 443s 609ms/step - loss: 0.6932 - accuracy: 0.5000
Epoch 10/10
727/727 [==============================] - 443s 610ms/step - loss: 0.6932 - accuracy: 0.5000

<tensorflow.python.keras.callbacks.History at 0x7f50023f9490>

Hi,

Some general thoughts on your questions.

The metric would normally be found with history = model.fit(…), check model.fit in keras, but you need a validation dataset split in case you havent it already. I think maybe you are training long enough the VGG, it a big model and unless you are using pre-trained sections it needs a lot of training.

The above explains it, bigger models need more runs to train properly.

You should ask the staff and QA team, they should know better?

@gent.spah - Thx for taking the time to answer this. You wouldn’t believe how much this has been bugging me over the past few days

I can definitely see that bigger models would take more time to train, but the sample model given should (I think) achieve better than 50% accuracy (ie Random guessing) when running the code provided to demonstrate that the model works.

At the bottom of my C1W4_Testing notebook, I directly compared equivalent VGG-ish architectures and found this result, so it isn’t just a matter of size (unless I’ve made a mistake). But, I have since figured out/remembered how to get the validation set included, so I know I’m not seeing an overtraining effect.

It might seem 50/50 at this moment but if trained longer (as long as everything else is set properly i.e. train dev test sets come from the same distribution, the loss function is right for the output etc.) it should in general better its performance, unless is because of its size and some parts of the model are dying due to weights going to 0, but I would not think so because these models are optimized.