# Cannot compute_cost course 2 week 3

I dont exactly undertstand what Y_pred and Y_true are in the tf.keras.losses.binary_crossentropy parameters. I tried initializing them randomly but could not get the right answer

6 Likes

I write the code as instructed:

{moderator edit - solution code removed}

But i got the AssertionError

AssertionError: Test does not match. Did you get the mean of your cost functions?

1 Like

Please check the order of the arguments received by the function tf.keras.losses.binary_crossentropy.

Hope that helps.

10 Likes

Right! The first argument to binary_crossentropy is y_true, which is the known correct â€ślabelâ€ť values. The second y_pred is the output of your model. Whether those are logits (before the activation function) or the actual sigmoid or softmax output depends on the value of the from_logits parameter. They are telling us to use the logits as the input here, which is why there is no activation function at the output layer of the forward propagation function as they have us write it.

10 Likes

Hi, Paul, two questions here:

1. While taking y_pred=logits as input and setting from_logits=True, is this loss function comparing the distance between logits and labels to compute the loss? or it will perform a sigmoid function on the logits, then compare the distance between sigmoid(logits) and labels ?

2. How to decide when to use y_pred=logits as input, and when to use y_pred=sigmoid(logits) as input ?

2 Likes

Update: Please note that after this thread was created, the Course Staff made some significant updates to this assignment, which include switching to using CategoricalCrossEntropy for the loss function in this section.

Because we specify the from_logits = True argument, that means that the loss logic will apply either sigmoid or softmax to the logits to compute the actual \hat{y} values and then will compute the cross entropy loss between the predictions and the labels:

-y_true * log(sigmoid(y_pred))

Note that I think thereâ€™s a big question here that they donâ€™t explain: they tell us to call BinaryCrossentropy loss, but weâ€™ve actually got a multi-class problem here. So I think technically we should be calling CategoricalCrossentropy loss, which has the same from_logits argument. If you read the TF docs, it sounds like what they are doing here should be a bug. Then they donâ€™t really give you any way to assess the results of the training. I went ahead and added the logic to compute the prediction accuracy just for my own edification and it turns out the training here works just fine, although it works a lot better if you use Adam optimization as suggested in the instructions as opposed to the SGD that the code template actually uses. I conclude from this that the Keras BinaryCrossentropy function is actually smart enough to see that this is not a binary case and just does â€śThe Right Thingâ„˘â€ť. I have filed a request with the course staff to clarify this.

As to the question of when to use from_logits = True, I think itâ€™s just your choice. But if you read the documentation about this or do a little googling, you find that the reason that they added this feature is that it gives better efficiency (invoking one method instead of two) and also allows them to implement the computations in a way that is more numerically stable. E.g. consider the case in which you have saturated sigmoid values. So in all the cases Iâ€™ve seen in this course and others, they always seem to choose from_logits = True. Itâ€™s fewer lines of code and it apparently works better, so it seems like the way to go.

18 Likes

Thank you so much, Paul!

I tried the following two ways, and they produce the same result, which exactly proved your analysis:
1. cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))

2. y_pred = tf.keras.activations.sigmoid(logits)
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))

22 Likes

@Damon: Very cool! Thanks for running the experiment and confirming the theory!

The next thing I want to try is using CategoricalCrossentropy and see if that also is equivalent. Science!

6 Likes

@paulinpaloalto , I did some more trials, the result shows as below:

Note: Model trained with training set from Course 2 Week 3 assignment

Accordingly, it concludes that:

1. For binary classification issue, from_logits=True without extra activation function works as good as sigmoid(logits) with binary_crossentropy loss function.
2. For binary classification issue, both softmax and categorical_crossentropy loss function do not perform well.

key code as below:
Trial No. 1:

1. cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))

2. y_pred = tf.keras.activations.sigmoid(logits)
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))

Trial No. 2:

1. cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))

2. y_pred = tf.keras.activations.sigmoid(logits)
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))

Trial No. 3:

1. cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))

2. cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True))

Trial No. 4:

1. y_pred = tf.keras.activations.softmax(logits)
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))

2. y_pred = tf.keras.activations.softmax(logits)
cost = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(y_pred), from_logits=False))

13 Likes

Hi, @Damon

Thanks very much for doing this work and sharing the results of your experiments. I can also confirm that I had tried the CategoricalCrossentropy experiments and saw that it didnâ€™t work at all basically. There is a mystery there in the sense that I would have expected that to work. So there is something important that weâ€™re missing here. At some point, I hope to have time to investigate and explain, but I probably wonâ€™t be able to get to it for a few days.

Thanks again for your work on this!
Paul

3 Likes

@paulinpaloalto, exploring this really help me getting a better understanding on TF and kerasâ€™ structure, and being more familiar with sigmoid & softmax, binary_crossentropy & catogerical_crossentropy.

And I have to thank you for your guidance. Just take your time and pls let me know if thereâ€™s any new progress on this issue.

Good day.

3 Likes

May I ask why I get different still close result for tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, labels))?
The output is 0.8398901851987969 versus the original approach tf.reduce_mean(tf.keras.losses.binary_crossentropy(labels, logits, from_logits=True)) it is 0.8419182681095858? Thanks.

2 Likes

Thank you. This worked for me.

2 Likes

I am getting output

tf.Tensor([0.7752516 0.9752516 0.7752516], shape=(3,), dtype=float64)


from code

cost = tf.keras.losses.binary_crossentropy(labels, logits, from_logits = True)


where as the expected output has shape ()

Can you offer any advice? Thank you.

1 Like

Hi, @michael.brent.

Donâ€™t forget to average

3 Likes

Thanks. The instructions say, " * Just for reference here is how the Binary Cross entropy is calculated in TensorFlow:

mean_reduce(max(logits, 0) - logits * labels + log(1 + exp(-abs(logits))), axis=-1) "

which seems to imply that the averaging is automatic. Isnâ€™t that what â€śmean_reduceâ€ť does?

1 Like

That expression returns one loss value per example. The call to mean_reduce is indeed confusing.

Hereâ€™s the derivation for the formula, in case you find it interesting.

3 Likes

I tried:
cost = tf.reduce_mean(tf.keras.losses.binary_crossentropy(y_true = labels, y_pred = logits, from_logits=True))

but didnâ€™t work

2 Likes

If you look at the output of the binary_crossentropy, youâ€™ll see it computes the loss not on the dimension we expect. Youâ€™ll see 6 different losses even though we have 2 samples.

To make it right, you have to transpose the labels and logic matrices. (tf.transpose)

11 Likes

Hi All,