just figure out that in the assignment there is a statement:

compute the reconstruction loss (hint: use the mse_loss defined above instead of bce_loss in the ungraded lab, then multiply by the flattened dimensions of the image (i.e. 64 x 64 x 3)

Looks like a good hint, I find out it is logical because the autoencoder stuff is not pure classification or categorization cases. (correct me if it is wrong), I think using binary_crossentropy or categorical_crossentropy is not consistent.

So I turn back to the ungraded lab, find out that the lab is using binary_crossentropy which expects probabilities according to the library source code.

Now I get confused, because when I change the loss to mean_square_error, the lab still works fine, even with binary_crossentropy(from_logits=True) (I guess & have not tried).

The question is whether binary_crossentropy is the single alternative loss for this or I was wrong?

Or just because the final output of the decoder in the ungraded lab is

I found some relevant discussions of this under these links:

According to the math that is also discussed in the arXiv paper linked at class, if we use Gaussian prior for the latent representation, we should use MSE loss. However, the MNIST VAE showcase works well with BCE loss and to me it seems that the learning gets stuck on a plateau when I try MSE loss.
From what I read, I gather that the BCE loss works well in this case because the input distribution is close to Bernoulli, i.e., with good approximation, there are almost only black and white pixels (0ās and 1ās). But then Iām not sure why we use a Gaussian prior for this exercise in the first place.