@paulinpaloalto , I did some more trials, the result shows as below:
Note: Model trained with training set from Course 2 Week 3 assignment
Accordingly, it concludes that:
- For binary classification issue, from_logits=True without extra activation function works as good as sigmoid(logits) with binary_crossentropy loss function.
- For binary classification issue, both softmax and categorical_crossentropy loss function do not perform well.
key code as below:
Trial No. 1:
-
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
-
y_pred = tf.keras.activations.sigmoid(logits)
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))
Trial No. 2:
-
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
-
y_pred = tf.keras.activations.sigmoid(logits)
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))
Trial No. 3:
-
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
-
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
Trial No. 4:
-
y_pred = tf.keras.activations.softmax(logits)
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False)) -
y_pred = tf.keras.activations.softmax(logits)
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))
