CatVsNonCat dataset classification using TensorFlow

Nayan_Prakash1 · January 21, 2024, 12:06pm

I wrote a program using TensorFlow to replicate the Course 1’s assignment of differentiating the Cat vs non-cat data set.

I am unable to get a good accuracy and lesser cost despite using the same set of parameters as in the Course 1. (Structure of Hidden layer(20,7,5,1), learning rate=0.0075 etc.)

Please help to find out the error if any or point out to the incorrect approach. I am attaching the Images.

balaji.ambresh · January 21, 2024, 12:38pm

Please answer the following questions:

Can I assume that you’ve taken courses 2 and 3 of the deep learning specialization?
On which data split(s) are you observing low accuracy?

Nayan_Prakash1 · January 21, 2024, 1:15pm

I have completed courses 1,2 and course 4 in ongoing. Have finished Week1.
Low accuracy is observed in train_catvsnoncat.h5 . This dataset was used in course 1 for building the NN bit by bit.

balaji.ambresh · January 21, 2024, 2:01pm

Please complete courses 2 and 3 before moving to the 4th course. Model debugging techniques are taught there.

paulinpaloalto · January 21, 2024, 8:40pm

I don’t think DLS Course 3 is relevant to solving this particular problem. There is no programming there and it talks more about data management. But here we have a fixed data set.

You also don’t actually show the output you are getting. What are the accuracy results and how do you evaluate them?

Nayan_Prakash1 · January 22, 2024, 11:03am

Thanks for the replies. I am attaching the image depicting the loss (0.644) and accuracy (65.55%) on training data after 2500 iterations. I am also attaching the model summary which looks fine to me.

paulinpaloalto · January 22, 2024, 3:50pm

Notice that nothing is happening in your iterations: the cost just bounces around by a tiny amount and doesn’t really decrease and the accuracy does not change. Of course accuracy is “quantized” meaning that just because the cost changes, the accuracy may legitimately not change.

But we can only see the last few iterations. How far back does that pattern go?

This would suggest that you try a different optimization algorithm and also perhaps a different initializer. They didn’t make a big deal about it in DLS C1 W4, but if you looked at the packaged functions they gave you in that exercise in W4 notice that they gave you Xavier or He Initialization. I forget which it was. But if I remember correctly Glorot is just another name for Xavier Initialization. This dataset and model does not converge well at all with the simple initialization they showed us in the previous Step by Step assignment,

paulinpaloalto · January 23, 2024, 5:08am

I defined the model like this:

def catModel():
    """
    Implements the forward propagation for the binary classification model:
    Dense layers ending in a sigmoid output
    
    Arguments:
    None

    Returns:
    model -- TF Keras model (object containing the information for the entire training process) 
    """
    model = tf.keras.Sequential([
            # YOUR CODE STARTS HERE
            tfl.Dense(units=20, activation="relu", input_shape=(12288,)),
            tfl.Dense(units=7, activation="relu"),
            tfl.Dense(units=5, activation="relu"),
            tfl.Dense(units=1, activation="sigmoid"),
            # YOUR CODE ENDS HERE
        ])
    
    return model

With that simple structure and compiling it like this:

cat_model = catModel()
cat_model.compile(optimizer='adam',
                   loss='binary_crossentropy',
                   metrics=['accuracy'])

And training it like this:

cat_model.fit(X_train, Y_train, epochs=60, batch_size=32)

I end up with 99.5% training accuracy and 72% test accuracy. The training runs really fast as well.

paulinpaloalto · January 23, 2024, 5:55pm

One other thing to check is to make sure you prepped the data correctly. Here’s my “load data” cell:

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_cat_data()

# Normalize image vectors
X_train = X_train_orig.reshape((X_train_orig.shape[0],-1))/255.
X_test = X_test_orig.reshape((X_test_orig.shape[0],-1))/255.

# Reshape
Y_train = Y_train_orig.T
Y_test = Y_test_orig.T

print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train_orig shape: " + str(X_train_orig.shape))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

Running that gives this output:

number of training examples = 209
number of test examples = 50
X_train_orig shape: (209, 64, 64, 3)
X_train shape: (209, 12288)
Y_train shape: (209, 1)
X_test shape: (50, 12288)
Y_test shape: (50, 1)

One big difference with TF is that it requires the “samples first” orientation in all the data, both inputs and labels.

Nayan_Prakash1 · January 25, 2024, 7:41am

About the initialization, yes! I am aware of the Xavier initialization in the packaged function. I will try some other optimization technique as well.

Nayan_Prakash1 · January 25, 2024, 7:43am

I tried out the sequential method but was not getting the expected weight shapes.

Nayan_Prakash1 · January 25, 2024, 7:44am

I wanted to know the error with code that I have written because the basic functionality of using 4 layers (20,7,5,1) and ReLU function is the same.

TMosh · January 25, 2024, 7:57am

I got the same bad results as you did, until I remembered to use the normalized training data set.

paulinpaloalto · January 25, 2024, 4:12pm

The shapes of the W1, W2 and W3 matrices will be different in TensorFlow because we are forced to use the “samples first” data orientation, but we used the other orientation when we did it “by hand” in python in DLS C1 W4. We have 12288 input features and layer 1 has 20 output neurons, so think about how the matrix multiplication needs to work between W1 and X when X has shape 209 x 12288 and we want the output to be 209 x 20. So we need W1 to be 12288 x 20 and we multiply X \cdot W1 to get the 209 x 20 output for Z1.

When we did it in DLS C1, we everything was the transpose of that, so W1 was 20 x 12288. Of course we have this mathematical relationship:

(A \cdot B)^T = B^T \cdot A^T

Does that address your point or did you mean something different?

paulinpaloalto · January 25, 2024, 4:16pm

Yes, if you look carefully at the “code prep” cell that I showed above, it includes dividing all the pixel values by 255., which you’ll also note was done in DLS C1 W2 A2 and DLS C1 W4 A2. That is essential to get good convergence. All these details matter.

Topic		Replies	Views
Recreated Course #1 Cat Classifier Structuring Machine Learning Projects coursera-platform	2	600	May 30, 2022
W2_A2_Error_Initialize with zeros undefined Neural Networks and Deep Learning coursera-platform	8	394	November 25, 2023
Exercise_1_Cats_vs_Dogs_Question-FINAL Convolutional Neural Networks in TensorFlow week-module-1	3	706	July 6, 2022
Can not achieve desired accuracy Convolutional Neural Networks in TensorFlow week-module-4	2	417	September 4, 2023
Can't reach desired accuraacy for C2W4 task Convolutional Neural Networks in TensorFlow week-module-4	12	742	August 15, 2023

CatVsNonCat dataset classification using TensorFlow

Related topics