I wrote a program using TensorFlow to replicate the Course 1’s assignment of differentiating the Cat vs non-cat data set.
I am unable to get a good accuracy and lesser cost despite using the same set of parameters as in the Course 1. (Structure of Hidden layer(20,7,5,1), learning rate=0.0075 etc.)
Please help to find out the error if any or point out to the incorrect approach. I am attaching the Images.
I don’t think DLS Course 3 is relevant to solving this particular problem. There is no programming there and it talks more about data management. But here we have a fixed data set.
You also don’t actually show the output you are getting. What are the accuracy results and how do you evaluate them?
Thanks for the replies. I am attaching the image depicting the loss (0.644) and accuracy (65.55%) on training data after 2500 iterations. I am also attaching the model summary which looks fine to me.
Notice that nothing is happening in your iterations: the cost just bounces around by a tiny amount and doesn’t really decrease and the accuracy does not change. Of course accuracy is “quantized” meaning that just because the cost changes, the accuracy may legitimately not change.
But we can only see the last few iterations. How far back does that pattern go?
This would suggest that you try a different optimization algorithm and also perhaps a different initializer. They didn’t make a big deal about it in DLS C1 W4, but if you looked at the packaged functions they gave you in that exercise in W4 notice that they gave you Xavier or He Initialization. I forget which it was. But if I remember correctly Glorot is just another name for Xavier Initialization. This dataset and model does not converge well at all with the simple initialization they showed us in the previous Step by Step assignment,
def catModel():
"""
Implements the forward propagation for the binary classification model:
Dense layers ending in a sigmoid output
Arguments:
None
Returns:
model -- TF Keras model (object containing the information for the entire training process)
"""
model = tf.keras.Sequential([
# YOUR CODE STARTS HERE
tfl.Dense(units=20, activation="relu", input_shape=(12288,)),
tfl.Dense(units=7, activation="relu"),
tfl.Dense(units=5, activation="relu"),
tfl.Dense(units=1, activation="sigmoid"),
# YOUR CODE ENDS HERE
])
return model
With that simple structure and compiling it like this:
The shapes of the W1, W2 and W3 matrices will be different in TensorFlow because we are forced to use the “samples first” data orientation, but we used the other orientation when we did it “by hand” in python in DLS C1 W4. We have 12288 input features and layer 1 has 20 output neurons, so think about how the matrix multiplication needs to work between W1 and X when X has shape 209 x 12288 and we want the output to be 209 x 20. So we need W1 to be 12288 x 20 and we multiply X \cdot W1 to get the 209 x 20 output for Z1.
When we did it in DLS C1, we everything was the transpose of that, so W1 was 20 x 12288. Of course we have this mathematical relationship:
(A \cdot B)^T = B^T \cdot A^T
Does that address your point or did you mean something different?
Yes, if you look carefully at the “code prep” cell that I showed above, it includes dividing all the pixel values by 255., which you’ll also note was done in DLS C1 W2 A2 and DLS C1 W4 A2. That is essential to get good convergence. All these details matter.