Setup trainable weights that are feature/sample specific but not explicitly depending on any input

Hello, I am building a custom VAE for my single cell transcriptomics analysis. In short, my goal is to feed a number_of_genes by number_of_cells (number of samples) matrix to the VAE and encode single cell gene expression patterns into a latent space. And instead of having a decoder that reconstructs the gene expression level for gene i and cell j, I wanted the decode output to be a "mean parameter " for gene i and cell j. I would then compute the negative log likelihood of observed gene expression given the decoded mean parameter and use it to train the model.

In addition to the gene and cell specific mean parameter, I would also like to learn a sample (cell) specific factor (also an input to likelihood function) to account for the technical variability. For some reason I wanted this cell specific factor to not rely explicitly connected to any input or other layer and instead be updated only through its gradients with respect to the loss. Is it generally a good practice to have standalone trainable weights whose update depends only on its gradient with respect to its loss?

Implementation wise, I built a custom simple dense layer with non-trainable weights initialized to zero andand trainable biases of the dimension and I will just feed it arbitrary input from some other layer, knowing that they will be multiplied with the non-trainable 0 weights so only the trainable biases will remain and contribute to the loss function. I find this approach to be rather awkward so I wonder if there are better ways to creating feature/sample specific trainable layers that takes no inputs?

As reference, the other approaches I have I tried and failed are below (both gave me ValueError: Output tensors of a Functional model must be the output of a TensorFlow Layer when I attempt to include their output as the decoder model output)

making a custom layer class with a call function that returns its kernal

class Simple(Layer):
    def __init__(self, output_dim, activation='relu', **kwargs):
       self.output_dim = output_dim
       self.activation=tf.keras.activations.get(activation)
       super(Simple, self).__init__(**kwargs)

    def build(self, input_shapes):
       self.kernel = self.add_weight(name='kernel', shape=self.output_dim, initializer='uniform', trainable=True)
       super(Simple, self).build(input_shapes)  

    def call(self, inputs):
       return self.activation(self.kernel + 1e-8) # force dispersion to be positive

and calling it by

dispersion = Simple(output_dim=output_dim)([])

using a standard dense layer but feeding it a dummy zero input

dispersion = tf.keras.layers.Dense(units=output_dim, activation='relu', bias_constraint=tf.keras.constraints.NonNeg)(tf.zeros(shape=(x.shape)))

I know nothing about the dataset or technique you are using. So this is just a general comment.

I am not sure why you have a Dense layer with all-zeros weights and then don’t allow them to be trained. This seems to do nothing useful. Perhaps I am mistaken.

Thanks for your response. Sorry for not being clear about what I want to achieve.

I would like my model to learn the inverse dispersion parameters for expression of different genes. Specifically, I want to learn 1 inverse dispersion parameter per gene and I want exactly 1 set of these inverse dispersion parameters learned and returned regardless of how many samples were used to train the model.

I am looking for a way to allow my model to learn a common set of parameters across all input samples.

If my understanding is right, a dense layer with all zero non-trainable weights and trainable bias is just going to return the value for the trainable bias. So I was thinking I can just create this layer with however many units as the number of genes I had in my input and use it to store and learn gene specific parameters that do not vary from cell to cell (sample to sample).

You can still get that by training all of the weights and biases, and then just look at the biases.

Opinion: If you’re going to use machine learning for this, you’re going to have to train like is typical for any ML system, then figure out how to interpret the weights that are leaned. I don’t think you can constrain the type of solution you want to get so that it matches some other analysis method.

Maybe I am just missing your point here, but that’s the way all training works in every Neural Network I’ve ever seen: the parameters are learned based on all the samples in the training data and are shared across (common to) all data. There is one set of parameters that are used to make predictions on any data.

You are not missing my point at all. It seems like I have confused the idea of latent embedding and model weights as I code up the VAE, and all I need for my purpose is probably several regular dense layers and setting some weight constraints. Thank you all for addressing my issue.

Which Machine Learning or Deep Learning courses have you completed? This may help us guide you with further explanations.

FYI have compelted Neural Networks and Deep Learning, Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization, Convolutional Neural Networks, and Custom Models, Layers, and Loss Functions with TensorFlow. I am taken Generative Deep Learning with TensorFlow and have completed everything before the GAN lessons.

Note that embedding layers have learned weights as well. I’m not specifically familiar with VAEs, but the face embeddings we learned about in DLS C4 and the word embeddings we learned about in DLS C5 all have learned weights. From the few things I’ve heard about them, I believe VAEs are more in the direction of GANs meaning that they are basically “making up” synthetic outputs that resemble the real samples, right? If you’re trying to do genomic analysis, is synthetic output really what you want? I am not a biologist or geneticist, but I would think you would be trying to deduce patterns and recognize things in the input, as opposed to creating synthetic outputs.

Maybe it would help if you could say more about what your goals are here. How did you arrive at the idea that VAEs are the solution to your problem?

Hello. Let me elaborate on what I am trying to achieve. I wanted a ML algorithm that, given sets of gene expression levels interrogated from numerous single cells, will learn the differences and similarities between cells that are generated within the same experiment and cells that are from different experiments (e.g. from two different patients) in a unsupervised manner.

Please correct me if I am wrong, from the VAE course I learned that VAEs can be used to generate for synthetic data, but they also capable of doing feature extraction. So I was thinking I can use VAEs to learn from different single cell samples, project them to a common latent space, and use the latent representations for down-stream analysis.

I have seen several applications that uses autoencoding variational bayes and VAE to learn latent representations of single cell data : scVI (scVI — scvi-tools) and scVAE (scVAE: variational auto-encoders for single-cell gene expression data | Bioinformatics | Oxford Academic) and have actually used the former. It appear that VAEs is suitable for my purpose, and I am looking to create a VAE of my own and have some modified functionalities to better suit my needs.

Thanks for the additional details. That’s a really interesting point about using the encoder portion of a VAE as a way to extract information from the data. But training a VAE is done by measuring the distance between the output of the decoder versus the training data, isn’t it? Meaning that you still have to deal with the decoder, even though you don’t really care about the decoder output for your real purpose.

It does sound like what you are doing is very similar to what they describe in both of the links. I took a quick look at the paper in the second link and they point to a GitHub repo that they seem to say is their complete implementation. So you should be able to find some pretty concrete suggestions there, although I have not looked at the actual code. Of course as someone who spent a career doing software engineering, I know that reading someone else’s complete code base is not always an easy way to go. It may well turn out to be an exercise in finding needles in haystacks.