C4W2 Lab4: Convolutional Encoder Model - Training?

Hey
In the code, the “encoder visualization” has a unique Conv2D layer that doesn’t get trained, but is called with “predict” later.
What’s the idea behind that, is the initialization of weights in the layer enough? what’s the meaning of a model output without any training?

thanks!

Hi,
Could you specify in which part is this found so I can have a look at it?

Well, it’s not a specific line…
in the “bottle_neck” function the “encoder_visualization” is defined as the output of a Conv2D layer that’s follow the “botttle_neck”, which is the output of the main model’s Conv2D. the aim of this encoder_visualization is to reduce the channels in order to display the encoded representation.

In the “convolutional_auto_encoder” function the models get built, and the “encoder_model” is defined with the prior “encoder_visualization” as its output. this model-- “encoder_model”, which is called later “convolutional_encoder_model” never get trained. and this branched layer “Conv2D” I mentioned earlier is not trained, but nevertheless get called with “predict”.

You should be precise when you post here to which lab or assignment you are referring, it took me while to find the lab you are referring to. If you are referring to “C4_W2_Lab_4_FashionMNIST_CNNAutoEncoder” then this line: convolutional_model, convolutional_encoder_model = convolutional_auto_encoder() runs through the bottleneck_output, encoder_visualization = bottle_neck(encoder_output) being part of the convolutional_auto_encoder() function. So yes, the bottle_neck part is trained as part of the function.

Hello, I think I have the same question about C4_W2_Lab_4:
As in the labs before, an auto-encoder is built here and additionally a model for visualizing the encoding.
Only the main model, the auto-encoder, is then trained.
In the previous labs, both models shared all layers, so the models are not independent of each other. As a result, the model for visualizing the encodings is implicitly trained when training the auto-encoder, as it references the layers trained there; the last encoder layer is then simply used as the output.

This lab is different: The encoder_visualization-layer is branched off from the bottleneck layer. This encoder_visualization layer has one filter, 3x3.

The parameters of this filter are not in the back-propagation path of the auto-encoder, so they are not trained in my understanding.
Training would not make sense either, as we just want to see what the embedding looks like.
My question is: Why is this layer not assigned defined values (e.g. “1.0”) to ensure consistent visualization?
The randomly pre-assigned parameters of this filter also distort the result, don’t they?

Hi @Bernhard_Wieczorek

Although I am not clearly understanding your question but I understand you raising issue for the bottleneck filter with 3×3 not used in encoder_visualization layer. So sharing a post comment link, please go through let me know if your doubt still persist

Regards
DP

Hello Deepti_Prasad,
thank you for your quick reply.
I understand why only one filter is used: To display a grayscale 2D “image” of the embedding, so far that is clear.
However, this filter has weights. These are initialized randomly during creation and are not trained during training, so they remain random values.
So when the embedding is sent through this layer (so that we get a visualization) they have an influence on the output, right?
My question is: Why are these weights not set to fixed values so that the visualization process produces deterministic results?

hi @Bernhard_Wieczorek

just to be more clear with your question where you mentioned the randomness in weight selection, can you share a screenshot from where you got this doubt, so I can understand why this doubt was raised.

But in a general rule weights are never assigned with assigned value to avoid any over fitting or to get more broadar way of letting the model choose its path to the best accuracy and least loss through randomness.

But please do share the image with this assignment so your doubts gets cleared.

Regards
DP

Hello Deepti_Prasad,
the encoder_visualization-layer is created here and the weights are initialized randomly (as always):

This layer is “fed” from the output of bottle_neck, in fact it is a fork of the data flow as the output of the bottle_neck layer is also fed into the decoder in parallel (last line):
grafik

Thus the encoder_visualization-layer is not part of the backward-propagation path of the auto -encoder.

Now the auto_encoder and the embedding-visualization-model is formed, but only the auto-encoder is trained:
grafik

As I understand it, this means that the encoder_visualization layer is not trained, so its weights remain in the random initial state.

This layer is not used in the sense of an ML model, but in my opinion is only a kind of auxiliary function to visualize the embedding, i.e. to convert the 7x7x256 embedding matrix into a 7x7x1 image.
So there is no need for training here, there is nothing to learn.

My point now is: If I create the encoder_visualization-layer a second time, it will have different values in the weights and thus my encoding visualization will look different in both cases, even if the underlying embedding is the same.
So my question is whether it would be better to use fixed weights.

I agree with you when it comes to the ML context: the model should find the best weight values itself and nothing should be specified, but here we actually only have the layer as an auxiliary function - there is nothing to learn.

Surely you can say that all this is not so relevant because it is only about the principle of displaying embeddings, so it doesn’t matter if they are always different and how they look.
I think the approach of displaying the embeddings makes sense and then, in my opinion, it is quite relevant in practice that the results are not random.
I’m asking here because I want to make sure that my conclusions are correct and to avoid overlooking or misunderstanding something.
Thanks for your help with this!

hi @Bernhard_Wieczorek

Are relating filters mentioned in encoder_visualization as weight? Filter actually represent number of output channels

the reason here encoder_visualization is recalled separately as it is using the input of different shape than the one recalled in the bottleneck.

Also the idea of not using the encoder visualization from bottleneck is basically to probably to make sure when the new auto_encoder model is created, the input is the original input, but the output is the bottleneck output created in the bottleneck region, so it trying learn basically if an image or input with change in dimension and the most pixel form able to learn anything about the input used initially.

Per se in this image the bottleneck learnt about the original input in its different dimensionality, can the same bottleneck output gives back the same input as the encoder used initially as input.

So as you see here choosing a particular weight would(which I don’t know where you want to put this fixed value here) in the auto_encoder might not learn anything new or the non- linearity of weight matrix is lost.

if you go through the pinned post this is all explained

Regards
DP

Hi Deepti_Prasad,
I don’t think I expressed myself precisely, please excuse me.
I have read the thread.
I would like to try to visualize my question here:

I am aware that the actual embedding is a 7x7x256 matrix, which is hard to visualize and hence the number of channels is reduced to 1.
I am also aware that this is a compression of the true embedding.

Nevertheless, I find the use of random values for this compression an unnecessary distortion and wonder whether defined values (e.g. weights = 1, bias = 0) would not provide a more linear, deterministic and therefore more interpretable result?

Or have I overlooked or not understood something fundamental here?

weight is not applied this way, if you notice the image shared by you, weight is being defined here by 3 × 3 × 256 and I suppose you know here what 3 x 3 is represented by the filters used in the encoder visualisation and 256 being the pixels.

So for the auto_encoder to workup, I hope you know the dimensions should match for input and output, and from what you are trying to put your thoughts using just a weight 1, will surely not lead to anything learned by the model.

if you want to say use filter of 1×1×256, that surely can be used.

Also using bias of 0 is clearly not going to make any difference as bias are added to the units to keep the data constantly to a given end point.

The idea of autoencoder visualisation basically is to basically simplify a model and then let model learn using bottleneck where it is compressed representation of the initial input fed.

Regards
DP

Well, if I understand your statement correctly, I would say that this is the job of the auto-encoder: to encode the input data into a smaller representation and decode it from there.
By the way, we are building a 7x7x256 encoding from a 28x28x1 input, so in my opinion this example model here is just inflating the input data.

The job of the encoder-visualization model is to give some insight into the encoding, right?
The more I think about it, the less I think the “encoder visualization layer”, a convolution with random weights followed by sigmoid non-linearity, would be a suitable solution for visualization.
I would prefer to visualize the embedding (bottleneck-output) directly with a standard Python function, e.g. matplotlib.pyplot.
This way we can get rid of the whole encoder-visualisation model and simply build the autoencoder with the embedding as an additional output.

Thanks for the discussion and your time, it helped me to think through this topic!

I hope you know for any auto encoder to work, the dimensions of input and output should match and yes it should be left for auto-encoder to look for what weight to choose but remember that when these were presented to us they were tried upon.

With that being said doesn’t mean others cannot putforth their thought process, also same goes for autoencoder with embedding, be it text or image, what input has been fed intially, the autoencoder need to have same dimension as that of the input for autoencoder to decode and not any fixed value of weights and biases.

Regards
DP