If you have normalized features, I suspect that learning rate may be too small.
Sorry, I guess when I read that image you pasted, I had my mouse over the image, so it overwrote the very last line and what I saw is this:
Lesson learned. Well, that looks reasonable. I have not tried the method of using the string name of the loss function, but I assume that will give you the default values for all the keyword parameters in the declaration of the function you select, which would include from_logits=False
in the case of binary_crossentropy
. So we still don’t have a plausible explanation for how your predictions are all in the range (0.5, 0.75). But that conflicts with the statement that you get 98% training accuracy. That can’t possibly be true if 40% of your training samples have false labels, right? You don’t show any modification to the accuracy metric in that code as written, so every prediction > 0.5 will be interpreted as true.
Sorry if this seems like a silly question, but how do you know that your outputs are all between (0.5, 0.75)?
No, that’s working around a fundamental issue with your model, rather than fixing the issue.
Thank you, your insight has helped me solve the issue. Your comment made me realize that I am actually applying a second sigmoid function in my code below, where I thought I was pulling the predicted values from the model. If I remove this sigmoid my predicted values are between (0,1). Silly that I have not realized that was there for so long, but thank you!
model_predict_value = lambda Xl: tf.nn.sigmoid(model.predict(Xl)).numpy()
prediction_value = model_predict_value(test_input)
print('range of prediction outputs', prediction_value)
Hi, Natalie.
That’s great news that you found the solution. Note that the way you wrote that prediction code would actually be correct if you wrote the network the way Prof Ng usually does, which is to omit the sigmoid
(or softmax
as appropriate) at the output layer and then use the from_logits = True
mode of the cross entropy loss function (all versions of cross entropy support that). That’s because in that case, you actually do end up with the prediction outputs being “logits”, meaning the pre-sigmoid values in the range (-\infty, \infty) and where > 0 means “true”.
The reason it is common to do it that way is explained on this thread. But it does then make it a bit more of a hassle to use the prediction values, because you need to manually apply sigmoid
(or softmax
). Your “lambda” function implementation would be a nice way to solve that problem, if you were in “logits” mode.
Best regards,
Paul