Hello, I am building a custom VAE for my single cell transcriptomics analysis. In short, my goal is to feed a number_of_genes by number_of_cells (number of samples) matrix to the VAE and encode single cell gene expression patterns into a latent space. And instead of having a decoder that reconstructs the gene expression level for gene i and cell j, I wanted the decode output to be a "mean parameter " for gene i and cell j. I would then compute the negative log likelihood of observed gene expression given the decoded mean parameter and use it to train the model.
In addition to the gene and cell specific mean parameter, I would also like to learn a sample (cell) specific factor (also an input to likelihood function) to account for the technical variability. For some reason I wanted this cell specific factor to not rely explicitly connected to any input or other layer and instead be updated only through its gradients with respect to the loss. Is it generally a good practice to have standalone trainable weights whose update depends only on its gradient with respect to its loss?
Implementation wise, I built a custom simple dense layer with non-trainable weights initialized to zero andand trainable biases of the dimension and I will just feed it arbitrary input from some other layer, knowing that they will be multiplied with the non-trainable 0 weights so only the trainable biases will remain and contribute to the loss function. I find this approach to be rather awkward so I wonder if there are better ways to creating feature/sample specific trainable layers that takes no inputs?
As reference, the other approaches I have I tried and failed are below (both gave me ValueError: Output tensors of a Functional model must be the output of a TensorFlow Layer
when I attempt to include their output as the decoder model output)
making a custom layer class with a call function that returns its kernal
class Simple(Layer):
def __init__(self, output_dim, activation='relu', **kwargs):
self.output_dim = output_dim
self.activation=tf.keras.activations.get(activation)
super(Simple, self).__init__(**kwargs)
def build(self, input_shapes):
self.kernel = self.add_weight(name='kernel', shape=self.output_dim, initializer='uniform', trainable=True)
super(Simple, self).build(input_shapes)
def call(self, inputs):
return self.activation(self.kernel + 1e-8) # force dispersion to be positive
and calling it by
dispersion = Simple(output_dim=output_dim)([])
using a standard dense layer but feeding it a dummy zero input
dispersion = tf.keras.layers.Dense(units=output_dim, activation='relu', bias_constraint=tf.keras.constraints.NonNeg)(tf.zeros(shape=(x.shape)))