VAE model training from multiple input tensors

Hello Everyone,

I will try my best to discuss my work and what I want to achieve.

Currently, I am leveraging Jupyter Notebook to develop a Variational Autoencoder (VAE) model aimed at predicting the Signal-to-Interference-plus-Noise Ratio (SINR) heatmap within an indoor setting while considering multiple inputs. To accurately capture the indoor geometry, my input comprises several images:

  1. Image-1: Represents the Euclidean distance of each point within the house from the Router/Access Point.
  2. Image-2: Illustrates the 3D distance of each point within the house from the Router/Access Point.
  3. Image-3: Depicts the permittivity of materials surrounding the Router/Access Point within the house.
  4. Image-4: Reflects the conductivity of materials surrounding the Router/Access Point within the house.
  5. Image-5: Indicates the precise location of the Router/Access Point within the house.

These inputs collectively inform the prediction of the SINR heatmap within the indoor environment. While exploring existing VAE examples trained on MNIST and Fashion MNIST datasets, I noticed they typically involve training on a single input channel. In contrast, my project demands training on five distinct channels.

Given the nature of my image dataset, I have opted for a CNN architecture for my VAE model. I am reaching out to seek advice or insights from individuals with experience in training models with multiple channels. Any suggestions or guidance on this front would be immensely valuable to me.

Thank you for your attention and support.

Best regards,
Rahul

3 Likes

Hi,

Until I understand, image-1 is just a reduction of Image-2, and all the images are represented by a matriz of n*m using a single channel. If not, they can be transformed so that all they have the same dimensions and also that they are aligned. Then I consider that you can just stack them into a 5 channels image (n * m * 5). Then obviously you need to adjust the input layer of your model.

Another way could be train 5 different models and then combine the output. I recon that this is a more naive approach. But try both and compare, so you can practice more and get your own conclusions.

Feel free to include more details of your implementation so that we can give you better advises.