CatVsNonCat dataset classification using TensorFlow

I wrote a program using TensorFlow to replicate the Course 1’s assignment of differentiating the Cat vs non-cat data set.

I am unable to get a good accuracy and lesser cost despite using the same set of parameters as in the Course 1. (Structure of Hidden layer(20,7,5,1), learning rate=0.0075 etc.)

Please help to find out the error if any or point out to the incorrect approach. I am attaching the Images.

Please answer the following questions:

  1. Can I assume that you’ve taken courses 2 and 3 of the deep learning specialization?
  2. On which data split(s) are you observing low accuracy?
  1. I have completed courses 1,2 and course 4 in ongoing. Have finished Week1.

  2. Low accuracy is observed in train_catvsnoncat.h5 . This dataset was used in course 1 for building the NN bit by bit.

Please complete courses 2 and 3 before moving to the 4th course. Model debugging techniques are taught there.

1 Like

I don’t think DLS Course 3 is relevant to solving this particular problem. There is no programming there and it talks more about data management. But here we have a fixed data set.

You also don’t actually show the output you are getting. What are the accuracy results and how do you evaluate them?

Thanks for the replies. I am attaching the image depicting the loss (0.644) and accuracy (65.55%) on training data after 2500 iterations. I am also attaching the model summary which looks fine to me.

Notice that nothing is happening in your iterations: the cost just bounces around by a tiny amount and doesn’t really decrease and the accuracy does not change. Of course accuracy is “quantized” meaning that just because the cost changes, the accuracy may legitimately not change.

But we can only see the last few iterations. How far back does that pattern go?

This would suggest that you try a different optimization algorithm and also perhaps a different initializer. They didn’t make a big deal about it in DLS C1 W4, but if you looked at the packaged functions they gave you in that exercise in W4 notice that they gave you Xavier or He Initialization. I forget which it was. But if I remember correctly Glorot is just another name for Xavier Initialization. This dataset and model does not converge well at all with the simple initialization they showed us in the previous Step by Step assignment,

I defined the model like this:

def catModel():
    Implements the forward propagation for the binary classification model:
    Dense layers ending in a sigmoid output

    model -- TF Keras model (object containing the information for the entire training process) 
    model = tf.keras.Sequential([
            tfl.Dense(units=20, activation="relu", input_shape=(12288,)),
            tfl.Dense(units=7, activation="relu"),
            tfl.Dense(units=5, activation="relu"),
            tfl.Dense(units=1, activation="sigmoid"),
            # YOUR CODE ENDS HERE
    return model

With that simple structure and compiling it like this:

cat_model = catModel()

And training it like this:, Y_train, epochs=60, batch_size=32)

I end up with 99.5% training accuracy and 72% test accuracy. The training runs really fast as well.

One other thing to check is to make sure you prepped the data correctly. Here’s my “load data” cell:

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_cat_data()

# Normalize image vectors
X_train = X_train_orig.reshape((X_train_orig.shape[0],-1))/255.
X_test = X_test_orig.reshape((X_test_orig.shape[0],-1))/255.

# Reshape
Y_train = Y_train_orig.T
Y_test = Y_test_orig.T

print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train_orig shape: " + str(X_train_orig.shape))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

Running that gives this output:

number of training examples = 209
number of test examples = 50
X_train_orig shape: (209, 64, 64, 3)
X_train shape: (209, 12288)
Y_train shape: (209, 1)
X_test shape: (50, 12288)
Y_test shape: (50, 1)

One big difference with TF is that it requires the “samples first” orientation in all the data, both inputs and labels.

About the initialization, yes! I am aware of the Xavier initialization in the packaged function. I will try some other optimization technique as well.

I tried out the sequential method but was not getting the expected weight shapes.

I wanted to know the error with code that I have written because the basic functionality of using 4 layers (20,7,5,1) and ReLU function is the same.

I got the same bad results as you did, until I remembered to use the normalized training data set.

The shapes of the W1, W2 and W3 matrices will be different in TensorFlow because we are forced to use the “samples first” data orientation, but we used the other orientation when we did it “by hand” in python in DLS C1 W4. We have 12288 input features and layer 1 has 20 output neurons, so think about how the matrix multiplication needs to work between W1 and X when X has shape 209 x 12288 and we want the output to be 209 x 20. So we need W1 to be 12288 x 20 and we multiply X \cdot W1 to get the 209 x 20 output for Z1.

When we did it in DLS C1, we everything was the transpose of that, so W1 was 20 x 12288. Of course we have this mathematical relationship:

(A \cdot B)^T = B^T \cdot A^T

Does that address your point or did you mean something different?

Yes, if you look carefully at the “code prep” cell that I showed above, it includes dividing all the pixel values by 255., which you’ll also note was done in DLS C1 W2 A2 and DLS C1 W4 A2. That is essential to get good convergence. All these details matter. :nerd_face: