For what it’s worth, I got fairly reasonable results, and my decoder looks just like yours:
input_5 (InputLayer) [(None, 512)] 0
decode_dense1 (Dense) (None, 8192) 4202496
batch_normalization_12 (Bat (None, 8192) 32768
chNormalization)
decode_reshape (Reshape) (None, 8, 8, 128) 0
decode_conv2d_2 (Conv2DTran (None, 16, 16, 128) 147584
spose)
batch_normalization_13 (Bat (None, 16, 16, 128) 512
chNormalization)
decode_conv2d_3 (Conv2DTran (None, 32, 32, 64) 73792
spose)
batch_normalization_14 (Bat (None, 32, 32, 64) 256
chNormalization)
decode_conv2d_4 (Conv2DTran (None, 64, 64, 32) 18464
spose)
batch_normalization_15 (Bat (None, 64, 64, 32) 128
chNormalization)
decode_final (Conv2DTranspo (None, 64, 64, 3) 867
se)
=================================================================
Total params: 4,476,867
Trainable params: 4,460,035
Non-trainable params: 16,832
What does your encoder look like? And, what loss function(s) do you use?