I don’t understand why we need to scale the last layer as the model is linear and it is trained with the actual Y values. as far as I remember we did not change the scale of x,y input data why the output should be different ?! I know it has been explained that in RNN the output of simpleRnn is between -1 and 1 and we scale up the output ( however, that also does not make sense to me as we train the model base on the actual variables and Y values) but in C4W4_L3 lab we used convolution and LSTM. why we multiply the final Dense result by 400?
Inputs to the model are not preprocessed to a small scale. Please see this topic on the effect of learning rate on model convergence based on the scale of data.
Since we want to compare the predicted and actual values of the model in the original scale, the lambda layer helps keeps the model weights low.
I understand normalization and it’s benefits. but in our case we did not normalized the input layer. we just added (lambda x: x*400) after the last dense layer. My assumption is whatever output we get from the last dense layer, it will be multiplied by 400, am I right ?
Your understanding is correct.
I think I got the idea, thanks! I have one more question, not related to this topic. I noticed in some courses ( I took couple of Andrew’s courses) we add bias parameter (X0=1) to the input variable X. But we did not do that in the past courses. Would you please explain why we do not add it to the input data?
Thanks,
Pouria
Bias is a parameter we add to a processing unit. For instance, each node in a Dense
layer has a bias term. Here’s an example:
In [2]: model = tf.keras.Sequential([
...: tf.keras.layers.Dense(2, input_shape=[10])])
In [3]: model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 2) 22
=================================================================
Total params: 22
Trainable params: 22
Non-trainable params: 0
_________________________________________________________________
There are 2 units in the Dense
layer. Each unit has 10
learnable parameters based on the input shape and 1
additional parameter based on the bias term. So, the total number of parameters are 2 * (10 + 1) = 22
We want to perform w^T \cdot X + b where b is the bias term. How we do it is upto the creator of the framework.
Should there be just 1 array, bias and weights are in the same array. This also requires that the dataset should include an additional entry for the bias term. It’s a lot easier to have a bias term excluded from the weights matrix since data preparation becomes a lot easier.
So, if I am not using NN in tensorflow , for example I use logistic regression in sklearn, then I have you add the bais to the input X parameter (x0=1)?
You don’t have to add a bias term for your input data when using an sklearn model.