Encoder_visualization layer


I would like to understand why we are creating the encoder_visualization layer like this:

encoder_visualization = tf.keras.layers.Conv2D(filters=1, kernel_size=(3,3), activation='sigmoid', padding='same')(bottle_neck)

instead of

encoder_visualization = tf.keras.layers.Conv2D(filters=1, kernel_size=(1,1), activation='sigmoid', padding='same')(bottle_neck)

If we are applying an extra Conv Layer with Kernel Size (3,3) we are not seeing the real bottleneck representation, aren’t we?



instead of like this

Hi there,

You mean using a 1×1 kernel rather than a 3×3 kernel? The 1×1 is not common to use because its computationally expensive the 3×3 is more common if you haven’t noticed, plus I dont think for our purposes it would make much difference in the visualization (we are not looking for super fine grain details just to see the effect of the encoder).

Thanks for your answer.

Therefore, if I understood properly, we are actually getting a good enough approximation of our bottleneck to avoid using the 1x1 kernel which is computationally expensive.

Yeah I think its enough.


First thought:
Why not use a Reshape layer? I’ve tried it, but get an error saying that the reshape layer has no meta data and therefore is not a valid end layer for a network.
Second thought:
Initialize the conv2D filter with 1 (1,1)-filter with padding=‘same’) with the values:
initializer = tf.keras.initializers.Ones()
then use it:
encoder_visualization = tf.keras.layers.Conv2D(filters=1, kernel_size=(1,1), kernel_initializer = initializer, activation=None, padding=‘same’)(bottle_neck)

If you just use the conv2D without a specific initialization (see initializer), you’ll get a random initialization and therefore a random visualization.

Hi @gent.spah,

I am still thinking about how to visualize the exact encoding.
If I would like to program an encoder/decoder, I would like to obtain the exact output of the encoder:
It would work like this:
Image → encoder+bottleneck → encoded image, send this encoded image to a friend, and this friend will decode the encoded image with the decoder.

For that, and regarding this last layer encoder_visualization, we want to get the exactly encoded image.
This last encoder_visualization (Conv2D layer) is not trained as far as I understand (the bottleneck is trained).
However, this last Conv2D layer, if not properly initialized, will have a random initialization, and therefore we’ll get a random visualization/encoded image.
So is the proposed solution in my last message correct to get the exact encoded image?
I.e., is it correct to use conv2D with:
initializer = tf.keras.initializers.Ones()
encoder_visualization = tf.keras.layers.Conv2d(filters=1, kernel_size=1, kernel_initializer=initializer, activation=None, padding=‘same’)(bottleneck) ?

Thank you so much!

I think you are correct when you say which layers are trained and which not as far as I looked into it. About the other part I dont know you should try and experiment with it! Have you thought how this encoder visualization is going to be fed to the decoder also why there is no activation on the conv2d layer? Just some thoughts of mine.

Hi @gent.spah,

Thank you for your answer!
I don’t put any activation on this Conv2D layer, because I don’t want any non-linearities (sigmoid or whatever), so that each value of the bottleneck encoding is respected when visualized, so:

  1. no activation
  2. filter of (1,1)
  3. initialization with Ones (considering this (1,1)-filter)
  4. ensure that this conv2D layer is not trainable

As a matter of fact, I wonder why in the lab this Conv2D-layer has an activation (sigmoid), when its purpose is to visualise something. And with a standard initialization, the weights would be random, so point 3 should also be taken into account.

I use Conv2D as a reshape of the bottleneck.
Regarding the decoder that “my friend” should get: we should add an extra initial layer to the decoder that transforms the encoded image into the encoding of the bottleneck.
So “my friend” would use “initial layer + decoder” to decode the encoded image.
The initial layer would have to be:
tf.keras.layers.Conv2DTranspose(filters=256, activation=None, kernel_size=1, kernel_initializer=initializer)
with initializer = tf.keras.initializers.Ones()
This extra initial layer needs no training, it’s just a transformation.
Is this transformation correct?

OK, if we want it to be used as an encoder - decoder, the easier way is to send to my friend the output of the bottleneck, and that’s it. But we wouldn’t have the visualization.

Once again, thank you!

The purpose of the activation is to provide a measure/indication for which pixels in the visualization are having an impact in it. The bright ones have more impact as far as I remember from the Lab. The pixel values have a certain range and they might be normalized in the Lab so be careful with your outputs range.

This extra initial layer needs no training, it’s just a transformation.
Is this transformation correct?

depends does it fit the input of the decoder, does the decoder expect this input, can it be fed as part of the training/transformation system?

About the rest as I said you guys have to try it out and please do report on the forum too if you wish, I have not tried doing what you want to do.

As far I remember if you dont introduce noise in a VAE, the encoded mapping can be reproduced same as the input (but dont take my word for it because I dont have time to check that week thoroughly, but you should go and check it).