Is it forced to use binary_crossentropy?

Chris.X · October 24, 2021, 1:07pm

Hey there,

just figure out that in the assignment there is a statement:

compute the reconstruction loss (hint: use the mse_loss defined above instead of bce_loss in the ungraded lab, then multiply by the flattened dimensions of the image (i.e. 64 x 64 x 3)

Looks like a good hint, I find out it is logical because the autoencoder stuff is not pure classification or categorization cases. (correct me if it is wrong), I think using binary_crossentropy or categorical_crossentropy is not consistent.

So I turn back to the ungraded lab, find out that the lab is using binary_crossentropy which expects probabilities according to the library source code.

Now I get confused, because when I change the loss to mean_square_error, the lab still works fine, even with binary_crossentropy(from_logits=True) (I guess & have not tried).

The question is whether binary_crossentropy is the single alternative loss for this or I was wrong?

Or just because the final output of the decoder in the ungraded lab is

tf.keras.layers.Conv2DTranspose(filters=1, kernel_size=3, strides=1, padding='same', activation='sigmoid', name="decode_final")(x)

which means the output is Width x Height x 1 which contains only 1 unit?

Or is there potential rules that

When > 1 unit we shall use mean_square_error, otherwise then with binary_crossentroy ?

jackliu333 · October 28, 2021, 4:21am

Binary cross entropy is used for classification problem, while mean squared error is for regression problem. Although mse could still be used in classification problem, it is not a recommended loss metric due to nonconvexity in binary classification case.
See Why Using Mean Squared Error(MSE) Cost Function for Binary Classification is a Bad Idea? | by Rafay Khan | Towards Data Science.

Chris.X · October 28, 2021, 5:14am

thanks, @jackliu333 , but my question additional and confusing more, is the autoencoder a classification or pure regression like bounding box stuff?

jackliu333 · November 1, 2021, 7:22am

autoencoder is a general architecture. Depending on the type of output at the final layer, it can support both classification and regression.

istvan · December 16, 2021, 10:36pm

I found some relevant discussions of this under these links:

According to the math that is also discussed in the arXiv paper linked at class, if we use Gaussian prior for the latent representation, we should use MSE loss. However, the MNIST VAE showcase works well with BCE loss and to me it seems that the learning gets stuck on a plateau when I try MSE loss.
From what I read, I gather that the BCE loss works well in this case because the input distribution is close to Bernoulli, i.e., with good approximation, there are almost only black and white pixels (0’s and 1’s). But then I’m not sure why we use a Gaussian prior for this exercise in the first place.

Topic		Replies	Views
Week 3 Assignment Binary cross entropy Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	1050	May 17, 2025
Error while using tf.keras.losses.BinaryCrossentropy Convolutional Neural Networks in TensorFlow week-1	5	559	April 12, 2023
Math behind "tf.keras.metrics.categorical_crossentropy" Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	902	June 5, 2025
C2W3_Assignment : unittest.test_create_final_model() fails on binary_crossentropy Convolutional Neural Networks in TensorFlow week-3	2	26	November 28, 2024
TensorFlow Introduction Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	678	June 17, 2021

Is it forced to use binary_crossentropy?

Related topics