Hello Steven,
Your first question’s answer is yes, you can use either mu or sigma and you get the same result.
your second questions answer
The encoder-decoder model is a way of organizing recurrent neural networks for sequence-to-sequence prediction problems.
The approach involves two recurrent neural networks, one to encode the source sequence, called the encoder, and a second to decode the encoded source sequence into the target sequence, called the decoder.
In the encoder model, when you select n_inputs which is the cardinality of the input sequence, e.g. number of features, words, or characters for each time step.
you then create a decoder model from this which creates n_output which is the cardinality of output sequence, eg number of features, words, or characters for each time step.
Which then again creates encoder-decoder model, where it passes n_units which is number of units to create in the encoder and decoder models.
The function then creates and returns 3 models, as follows:
train: Model that can be trained given source, target, and shifted target sequences.
inference_encoder: Encoder model used when making a prediction for a new source sequence.
inference_decoder Decoder model use when making a prediction for a new source sequence.
The model is trained given source and target sequences where the model takes both the source and a shifted version of the target sequence as input and predicts the whole target sequence.
During prediction, the inference_encoder model is used to encode the input sequence once which returns states that are used to initialize the inference_decoder model. From that point, the inference_decoder model is used to generate predictions step by step.
Then the predicted sequence uses in the following sequence
infenc: Encoder model used when making a prediction for a new source sequence.
infdec: Decoder model use when making a prediction for a new source sequence.
source:Encoded source sequence.
n_steps: Number of time steps in the target sequence.
cardinality: The cardinality of the output sequence, e.g. the number of features, words, or characters for each time step.
The function then returns a list containing the target sequence.
- Regarding when we call sigma.
In the variational autoencoder, the bottleneck vector is replaced by 2 separate vectors mean of the distribution and standard deviation error of the distribution. So whenever data is fed into the decoder, samples of the distribution is passed through the decoder. The loss function of variational autoencoder consists of 2 terms. First one is the reconstrcution loss, it is same as the autoencoder expect we have expectation term because we are sampling from the distribution.
The second term is the KL divergence term. The second term ensures that it stays within the normal distribution. We basically train to keep the latent space close to mean of 0 and standard deviation of 1 which is equivalent to normal distribution.
The mean of the vector and standard deviation representation is sampled into a vector and these samples are fed to decoder. The problem is we cannot do backpropagation or we cannot push the gradients into the sampled vector. In order to run the gradients through the entire network and train the network we will using reparameterization trick.
if you see the latent vector then it can be seen as the sum of the mu, which is the parameter you are learning, sigma which is also the parameter we are learning and multiplied by epsilon, this epsilon is where we put the stochastic part. This epsilon is always gonna be gaussian with zero mean and standard deviation of 1. So the process is we gonna sample from epsilon , multiplied by sigma and add it with mu to have latent vector. So mu and sigma are the only things we have to train and it would be possible to push the gradients to decrease the error and train the network. The epsilon, is ok not to be trained. We need the stochasticity which would help us in generating the images

It is a long read, hope it clears your doubt!!!
Happy learning!!!
Regards
DP