Hello,
I have a text corpus of movie reviews which need to be classified as positive(1) or negative(0).
I have chosen loss=‘binary_crossentropy’ and thinking about a metric.
I thought about using the F1-score which is good for binary classification, but do not see it in the list of metrics in Tensorflow 2.
I’m looking at ‘Binary Accuracy’ which calculates “how often predictions match binary labels”.
which seems fit for my problem.
yet, there is a threshhold parameter.
I have a balanced dataset, i.e. the number of positive labels match the number of negative labels in both the training and the testing sets.
What value should I be using for the threshold? leave it as default or set 0.5?
Thank you!
If you have a balanced dataset the F1 score is not really of much use, because its mainly used for unbalanced datasets. Now about the threshold ; i think you are using the sigmoid function for binary classification, you should understand how this works:
- if the sentiment is negative and your model is doing its job well, it should drive the output to almost 0
- if the sentiment is positive and your model is doing its job well, it should drive the output to almost 1
so basically depending on how good the model is, the separation between negative and positive prediction should be large (if the model is good). If you feel the model is good than even a lower threshold would still be effective, otherwise you increase the threshold because you are not confident in your model.
thank you for the reply. would it be OK to use the binary_accuracy metric then?
1 Like
Since you have only 2 labels why not Binary_crossentropy for loss and accuracy for the metrics?
yes, Binary_crossentropy for loss, and binary_accuracy for the metric? I am wondering why they have both binary_accuracy and accuracy?
thank you.
I see in tensorflow the accuracy doesnt have a threshold but other than that they look the same. Ultimately you have to read through the documentation.