[Question] Can someone explain to me how this snippet works?

Im in week 3 of Course 4, and studying the variational auto-encoder (VAE), and I’m looking at the mu (mean of the sample) and sigma (standard deviation of the sample), but looking at the code below, the code that computes mu and sigma are identical minus the layer name.

I was expecting something like tf.reduce_mean() or some lambda function that computes the mean and sigma of “x” input variable and outputs scalar values, one for mu and one for sigma.

Can anyone help me? Thank you in advance.

def encoder_layers(inputs, latent_dim):
  # add the Conv2D layers followed by BatchNormalization
  x = tf.keras.layers.Conv2D(filters=32, kernel_size=3, strides=2, padding="same", activation='relu', name="encode_conv1")(inputs)
  x = tf.keras.layers.BatchNormalization()(x)
  x = tf.keras.layers.Conv2D(filters=64, kernel_size=3, strides=2, padding='same', activation='relu', name="encode_conv2")(x)

  # assign to a different variable so you can extract the shape later
  batch_2 = tf.keras.layers.BatchNormalization()(x)

  # flatten the features and feed into the Dense network
  x = tf.keras.layers.Flatten(name="encode_flatten")(batch_2)

  # we arbitrarily used 20 units here but feel free to change and see what results you get
  x = tf.keras.layers.Dense(20, activation='relu', name="encode_dense")(x)
  x = tf.keras.layers.BatchNormalization()(x)

  # add output Dense networks for mu and sigma, units equal to the declared latent_dim.
  mu    = tf.keras.layers.Dense(latent_dim, name='latent_mu')(x)
  sigma = tf.keras.layers.Dense(latent_dim, name ='latent_sigma')(x)

  return mu, sigma, batch_2.shape

This is a network that produces suitable noise for the VAE as far as I remember. It does not compute that, it produces it and is injected into the VAE model to give variability to the output.

The purpose of the noise is so that its modeled without much precision. The model has 2 outputs the mu and sigma.

Thank you, I understand the purpose of the code, though when tracing the code I still can’t wrap my head around it. For reference, im putting here the associated functions that call the method in my initial post.

Here’s the function that calls the encoder_layers

def encoder_model(latent_dim, input_shape):
  inputs = tf.keras.layers.Input(shape=input_shape)
  mu, sigma, conv_shape = encoder_layers(inputs, latent_dim=LATENT_DIM)
  z = Sampling()((mu, sigma))
  model = tf.keras.Model(inputs, outputs=[mu, sigma, z])
  return model, conv_shape

Then here’s the sampling layer which, as you’ve mentioned, will generate the noise and you pass in the mu and sigma that got generated. But, as I’ve mentioned, mu and sigma are just tensors from a Dense Layer. It didn’t compute the mu and sigma.

class Sampling(tf.keras.layers.Layer):
  def call(self, inputs):
    mu, sigma = inputs
    batch = tf.shape(mu)[0]
    dim = tf.shape(mu)[1]
    epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
    return mu + tf.exp(0.5 * sigma) * epsilon

I watched Laurence Moroney’s explanation video a few times as he went through the code and felt like magic that mu and sigma suddenly got computed, I guess I really have to go back and put a little more thought on it

Yeah that the best thing to do to go back and reevaluate for sure, and not just for you but me too as its been some time for me as well.

Nevertheless the main point is you generate some noise for the VAE to stir up its generations so they are not the same or do not go into mode collapse…

Thank you very much for your insights.

Sorry for the late addition, but in the 2nd video for the week titled “VAE Architecture and Code” Laurence noted that mu and sigma aren’t calculated but they’re learned over time as the model learns what matches an input to an output.

I believe the terms ‘mu’ (mean) and ‘sigma’ (standard deviation) are retained because of the theoretical basis of the algorithm, but it is confusing that neither a mean nor standard deviation is calculated. At best, only approximations of those two measures are being learned by the model that are used to add noise (Gaussian). It seems like different names for those learned parameters would be more helpful understanding the model…something like mu_learned, sigma_learned (to be as blunt and clear as possible=).