2 Questions about Reconstruction Loss in C4_W3_Lab_1_VAE_MNIST

When calculating the reconstruction loss in C4_W3_Lab_1_VAE_MNIST, we are given the following code:

    flattened_inputs = tf.reshape(x_batch_train, shape=[-1])
    flattened_outputs = tf.reshape(reconstructed, shape=[-1])
    loss = mse_loss(flattened_inputs, flattened_outputs) * 784

I do not understand:

  1. What the purpose is of flattening the tensors before feeding them to mse_loss. I have experimented with adding the following 2 lines after the given loss calculation:

     loss_b = mse_loss(x_batch_train, reconstructed) * 784
     print( (loss_b.numpy() - loss.numpy()) / loss.numpy())
    

About half the time the relative difference is zero. The other times, the relative differences range from about 5e-08 to 3e-07. Both losses are tensors of shape=() and dtype=float32. So, why has the extra work been added? What am I missing?

  1. Why multiply by 784? Multiplying the loss function by a constant won’t change the location of the minimum, it will only change the value of that minimum. All this does is change the relative weights of the reconstuction loss and the KL loss. Is multiplying the reconstructive loss by the number of image pixels (i.e. 784 = 28 * 28) some sort of “best practice”?
1 Like

Hello Steven,

Sorry for delay in replying to your query.

  1. The answer to your first query of what is purpose of flattening/reshaping the inputs and outputs before including into the logs is to change the dimensionality into one dimensional tensor.
  1. [quote=“Steven1, post:1, topic:329336”]
    loss_b = mse_loss(x_batch_train, reconstructed) * 784
    [/quote]

Given input x as 28 by 28 handwritten digit image, we can make predictions on the validation set using the encoder network. This has the effect of translating the images from the 784-dimensional input space into the 2-dimensional latent space.

Hope this clears your doubt!!

Regards
DP

  1. But why change the dimensionality to 1-dimensional tensors? The MSE is invariant to the shape of its inputs (as long as they’re the same shape. For example,

from tensorflow.keras.losses import MeanSquaredError as MSE
mse = MSE()
mse([[1,2],[3,4]],[[4,3],[2,1]])
<tf.Tensor: shape=(), dtype=int32, numpy=5>
mse([1,2,3,4],[4,3,2,1])
<tf.Tensor: shape=(), dtype=int32, numpy=5>

Hello @Steven1,

Again really sorry for delayed response as I missed your reply on this post.

I will try to answer to my best of abilities and this might be a lengthier comment, so kindly be patient in reading the comment.

In machine learning, dimensionality reduction is the process of reducing the number of features that describe some data.

This reduction is done either by selection (only some existing features are conserved) or by extraction (a reduced number of new features are created based on the old features) and can be useful in many situations that require low dimensional data (data visualisation, data storage, heavy computation…).

Although there exists many different methods of dimensionality reduction, we can set a global framework that is matched by most (if not any!) of these methods.

First, let’s call encoder the process that produce the “new features” representation from the “old features” representation (by selection or by extraction) and decoder the reverse process.

Dimensionality reduction can then be interpreted as data compression where the encoder compress the data (from the initial space to the encoded space, also called latent space) whereas the decoder decompress them.

Of course, depending on the initial data distribution, the latent space dimension and the encoder definition, this compression can be lossy, meaning that a part of the information is lost during the encoding process and cannot be recovered when decoding.

The main purpose of a dimensionality reduction method is to find the best encoder/decoder pair among a given family.

In other words, for a given set of possible encoders and decoders, we are looking for the pair that keeps the maximum of information when encoding and, so, has the minimum of reconstruction error when decoding.

Let me know if it clears your doubt, feel free to ask any further doubts.

Regards
DP