I dont exactly undertstand what Y_pred and Y_true are in the tf.keras.losses.binary_crossentropy parameters. I tried initializing them randomly but could not get the right answer
Same problem. @DLS_Mentors Please help!
I write the code as instructed:
{moderator edit - solution code removed}
But i got the AssertionError
AssertionError: Test does not match. Did you get the mean of your cost functions?
Can someone checks please
Please check the order of the arguments received by the function tf.keras.losses.binary_crossentropy.
Hope that helps.
Right! The first argument to binary_crossentropy is y_true, which is the known correct “label” values. The second y_pred is the output of your model. Whether those are logits (before the activation function) or the actual sigmoid or softmax output depends on the value of the from_logits parameter. They are telling us to use the logits as the input here, which is why there is no activation function at the output layer of the forward propagation function as they have us write it.
Hi, Paul, two questions here:
-
While taking y_pred=logits as input and setting from_logits=True, is this loss function comparing the distance between logits and labels to compute the loss? or it will perform a sigmoid function on the logits, then compare the distance between sigmoid(logits) and labels ?
-
How to decide when to use y_pred=logits as input, and when to use y_pred=sigmoid(logits) as input ?
Thanks ahead.
Update: Please note that after this thread was created, the Course Staff made some significant updates to this assignment, which include switching to using CategoricalCrossEntropy for the loss function in this section.
Because we specify the from_logits = True argument, that means that the loss logic will apply either sigmoid or softmax to the logits to compute the actual \hat{y} values and then will compute the cross entropy loss between the predictions and the labels:
-y_true * log(sigmoid(y_pred))
Note that I think there’s a big question here that they don’t explain: they tell us to call BinaryCrossentropy loss, but we’ve actually got a multi-class problem here. So I think technically we should be calling CategoricalCrossentropy loss, which has the same from_logits argument. If you read the TF docs, it sounds like what they are doing here should be a bug. Then they don’t really give you any way to assess the results of the training. I went ahead and added the logic to compute the prediction accuracy just for my own edification and it turns out the training here works just fine, although it works a lot better if you use Adam optimization as suggested in the instructions as opposed to the SGD that the code template actually uses. I conclude from this that the Keras BinaryCrossentropy function is actually smart enough to see that this is not a binary case and just does “The Right Thing™”. I have filed a request with the course staff to clarify this.
As to the question of when to use from_logits = True, I think it’s just your choice. But if you read the documentation about this or do a little googling, you find that the reason that they added this feature is that it gives better efficiency (invoking one method instead of two) and also allows them to implement the computations in a way that is more numerically stable. E.g. consider the case in which you have saturated sigmoid values. So in all the cases I’ve seen in this course and others, they always seem to choose from_logits = True. It’s fewer lines of code and it apparently works better, so it seems like the way to go.
Thank you so much, Paul!
I tried the following two ways, and they produce the same result, which exactly proved your analysis:
1. cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
2. y_pred = tf.keras.activations.sigmoid(logits)
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))
@Damon: Very cool! Thanks for running the experiment and confirming the theory!
The next thing I want to try is using CategoricalCrossentropy and see if that also is equivalent. Science!
@paulinpaloalto , I did some more trials, the result shows as below:
Note: Model trained with training set from Course 2 Week 3 assignment
Accordingly, it concludes that:
- For binary classification issue, from_logits=True without extra activation function works as good as sigmoid(logits) with binary_crossentropy loss function.
- For binary classification issue, both softmax and categorical_crossentropy loss function do not perform well.
key code as below:
Trial No. 1:
-
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
-
y_pred = tf.keras.activations.sigmoid(logits)
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))
Trial No. 2:
-
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
-
y_pred = tf.keras.activations.sigmoid(logits)
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))
Trial No. 3:
-
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
-
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))
Trial No. 4:
-
y_pred = tf.keras.activations.softmax(logits)
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False)) -
y_pred = tf.keras.activations.softmax(logits)
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))
Hi, @Damon
Thanks very much for doing this work and sharing the results of your experiments. I can also confirm that I had tried the CategoricalCrossentropy experiments and saw that it didn’t work at all basically. There is a mystery there in the sense that I would have expected that to work. So there is something important that we’re missing here. At some point, I hope to have time to investigate and explain, but I probably won’t be able to get to it for a few days.
Thanks again for your work on this!
Paul
@paulinpaloalto, exploring this really help me getting a better understanding on TF and keras’ structure, and being more familiar with sigmoid & softmax, binary_crossentropy & catogerical_crossentropy.
And I have to thank you for your guidance. Just take your time and pls let me know if there’s any new progress on this issue.
Good day.
May I ask why I get different still close result for tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, labels))?
The output is 0.8398901851987969 versus the original approach tf.reduce_mean(tf.keras.losses.binary_crossentropy(labels, logits, from_logits=True)) it is 0.8419182681095858? Thanks.
Thank you. This worked for me.
I am getting output
tf.Tensor([0.7752516 0.9752516 0.7752516], shape=(3,), dtype=float64)
from code
cost = tf.keras.losses.binary_crossentropy(labels, logits, from_logits = True)
where as the expected output has shape ()
Can you offer any advice? Thank you.
Hi, @michael.brent.
Don’t forget to average
Thanks. The instructions say, " * Just for reference here is how the Binary Cross entropy is calculated in TensorFlow:
mean_reduce(max(logits, 0) - logits * labels + log(1 + exp(-abs(logits))), axis=-1)
"
which seems to imply that the averaging is automatic. Isn’t that what “mean_reduce” does?
That expression returns one loss value per example. The call to mean_reduce
is indeed confusing.
Here’s the derivation for the formula, in case you find it interesting.
I tried:
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(y_true = labels, y_pred = logits, from_logits=True))
but didn’t work
If you look at the output of the binary_crossentropy, you’ll see it computes the loss not on the dimension we expect. You’ll see 6 different losses even though we have 2 samples.
To make it right, you have to transpose the labels and logic matrices. (tf.transpose)
Hi All,
I am facing a similar issue with the compute_cost. Can you please help me
{moderator edit - solution code removed}
When I execute the definition compute_cost_test I am getting the below error