Sigmoid activation function issues

TMosh · May 6, 2024, 11:18pm

If you have normalized features, I suspect that learning rate may be too small.

paulinpaloalto · May 6, 2024, 11:42pm

Sorry, I guess when I read that image you pasted, I had my mouse over the image, so it overwrote the very last line and what I saw is this:

Lesson learned. Well, that looks reasonable. I have not tried the method of using the string name of the loss function, but I assume that will give you the default values for all the keyword parameters in the declaration of the function you select, which would include from_logits=False in the case of binary_crossentropy. So we still don’t have a plausible explanation for how your predictions are all in the range (0.5, 0.75). But that conflicts with the statement that you get 98% training accuracy. That can’t possibly be true if 40% of your training samples have false labels, right? You don’t show any modification to the accuracy metric in that code as written, so every prediction > 0.5 will be interpreted as true.

Sorry if this seems like a silly question, but how do you know that your outputs are all between (0.5, 0.75)?

TMosh · May 6, 2024, 11:59pm

No, that’s working around a fundamental issue with your model, rather than fixing the issue.

nhill99 · May 7, 2024, 3:42pm

Thank you, your insight has helped me solve the issue. Your comment made me realize that I am actually applying a second sigmoid function in my code below, where I thought I was pulling the predicted values from the model. If I remove this sigmoid my predicted values are between (0,1). Silly that I have not realized that was there for so long, but thank you!

model_predict_value = lambda Xl: tf.nn.sigmoid(model.predict(Xl)).numpy()
prediction_value = model_predict_value(test_input)

print('range of prediction outputs', prediction_value)

paulinpaloalto · May 7, 2024, 4:06pm

Hi, Natalie.

That’s great news that you found the solution. Note that the way you wrote that prediction code would actually be correct if you wrote the network the way Prof Ng usually does, which is to omit the sigmoid (or softmax as appropriate) at the output layer and then use the from_logits = True mode of the cross entropy loss function (all versions of cross entropy support that). That’s because in that case, you actually do end up with the prediction outputs being “logits”, meaning the pre-sigmoid values in the range (-\infty, \infty) and where > 0 means “true”.

The reason it is common to do it that way is explained on this thread. But it does then make it a bit more of a hassle to use the prediction values, because you need to manually apply sigmoid (or softmax). Your “lambda” function implementation would be a nice way to solve that problem, if you were in “logits” mode.

Best regards,
Paul

Topic		Replies	Views
Real world scenario using sigmoid as an activation function Advanced Learning Algorithms week-module-1	1	613	July 10, 2022
Activation functions in the hidden layers Advanced Learning Algorithms week-module-2	4	510	July 21, 2022
Activation Function for Last Layer - Lab Assignment: Neural Networks for Binary Classification Advanced Learning Algorithms week-module-1	2	518	August 1, 2023
Softmax as activation function for output layer of DenseNet121 model AI for Medical Diagnosis week-module-1	6	493	August 13, 2023
First binary classification model Neural Networks and Deep Learning coursera-platform	5	566	July 12, 2022

Sigmoid activation function issues

Related topics