Getting monochromatic reconstruction- trained for over an hour

hello,

Can someone tell me what’s wrong with the final layer of my decoder?
x = tf.keras.layers.Conv2DTranspose(filters=3, kernel_size=3, strides=1, padding=‘same’, activation=‘sigmoid’, name=“decode_final”)(x)

I am using three filters but I still get monochromatic reconstructed images.

the training is also taking way more than 30min I am training for 100 epochs it has been 1.5 hours and still the mse loss is 650

1 Like

Hi there,

You stride is not right. Also are you calculating the correct number of units on yhe decoder function, if I would take a guess the problem might be there…

in the ungraded lab, the stride is =1. The output of the last layer has the shape (64,64,3). I am using 3 conv layers. What should be the stride??
the units in the decoder are the same shape as batch3 as done in the lab
is there any other suggestion??

Apart from the stride which should not be 1 I cant think of anything else at this moment, lets see if you increase it a little bit what happens!

Model: “model_13”


Layer (type) Output Shape Param #

input_14 (InputLayer) [(None, 512)] 0

decode_dense1 (Dense) (None, 8192) 4202496

batch_normalization_36 (Bat (None, 8192) 32768
chNormalization)

decode_reshape (Reshape) (None, 8, 8, 128) 0

decode_conv2d_1 (Conv2DTran (None, 16, 16, 128) 147584
spose)

batch_normalization_37 (Bat (None, 16, 16, 128) 512
chNormalization)

decode_conv2d_2 (Conv2DTran (None, 32, 32, 64) 73792
spose)

batch_normalization_38 (Bat (None, 32, 32, 64) 256
chNormalization)

decode_conv2d_3 (Conv2DTran (None, 64, 64, 32) 18464
spose)

batch_normalization_39 (Bat (None, 64, 64, 32) 128
chNormalization)

decode_final (Conv2DTranspo (None, 64, 64, 3) 867
se)

=================================================================
Total params: 4,476,867
Trainable params: 4,460,035
Non-trainable params: 16,832

this is the summary of the decoder layers. I tried augmenting the stride in the last layer but it will give a larger image. eg stride = 2 results in a 128x128x3 stride 3 even bigger reconstructed image. I made some research and I found a formula that calculates the size of the image based on the # of filters kernel size and stride and a stride of 1 is the only option in this case.
do you think that I should go with the lab architecture instead? ( using the same architecture with only 2 conv layers ??

For what it’s worth, I got fairly reasonable results, and my decoder looks just like yours:

input_5 (InputLayer) [(None, 512)] 0

decode_dense1 (Dense) (None, 8192) 4202496

batch_normalization_12 (Bat (None, 8192) 32768
chNormalization)

decode_reshape (Reshape) (None, 8, 8, 128) 0

decode_conv2d_2 (Conv2DTran (None, 16, 16, 128) 147584
spose)

batch_normalization_13 (Bat (None, 16, 16, 128) 512
chNormalization)

decode_conv2d_3 (Conv2DTran (None, 32, 32, 64) 73792
spose)

batch_normalization_14 (Bat (None, 32, 32, 64) 256
chNormalization)

decode_conv2d_4 (Conv2DTran (None, 64, 64, 32) 18464
spose)

batch_normalization_15 (Bat (None, 64, 64, 32) 128
chNormalization)

decode_final (Conv2DTranspo (None, 64, 64, 3) 867
se)

=================================================================
Total params: 4,476,867
Trainable params: 4,460,035
Non-trainable params: 16,832

What does your encoder look like? And, what loss function(s) do you use?