# Lambda layers - C4_W3_Lab1

Hi,

I’d like to get detailed explanation for C4_W3_Lab_1, the two Lambda layers:

tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1),
input_shape=[G.WINDOW_SIZE]),

and

tf.keras.layers.Lambda(lambda x: x * 100.0)

I understand that they serve as pre-processing and post-processing, and flexibility in network design, but I need to understand precisely what each line does.

thank you,
Ed

1 Like

`tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1), input_shape=[G.WINDOW_SIZE])`

Takes a tensor of dimension `G.WINDOW_SIZE` and outputs an array of shape `(G.WINDOW_SIZE, 1)`. I’ve left out the batch dimension here. If you consider that, it takes in `(BATCH_SIZE, G.WINDOW_SIZE)` and outputs `(BATCH_SIZE, G.WINDOW_SIZE, 1)`. You can read this link to understand what the 3 dimensions stand for in the `inputs` call arguments section.

`tf.keras.layers.Lambda(lambda x: x * 100.0)`

Takes a tensor as input and multiplies it by 100 to produce the output. A reason for doing this is when inputs are not normalized and you still want to keep the weights low.

hi Balaji,

Thank you;
could you explain a bit more on:

`tf.keras.layers.Lambda(lambda x: x * 100.0)`

Takes a tensor as input and multiplies it by 100 to produce the output. A reason for doing this is when inputs are not normalized and you still want to keep the weights low.

1. In general, is this typical to have the last layer a lambda layer when working with RNNs?
2. How was the scale factor of 100 determined for this problem?
3. In general, how do I determine the ‘scale factor’ for other problems?

Thanks,
Ed

Looking at `x_train`, minimum is -21.603771 and maximum is 97.26462.
Dividing by 100 brings this close to the 0 to 1 range.

Scaling factor is just like any other hyperparameter. It is selected by using various values / statistics when fitting training data and by observing metrics in both training and validation sets.

There are 2 ways of developing a model:

1. Scale inputs and don’t use a lambda layer in the output.
2. Scale outputs and don’t touch the inputs.

Most of the NN architectures I’ve seen so far follow the 1st style. It’s good to know this way of doing it as well.

Dear Balaji,

Thank you SO much for your help.
thanks for explaining the scaling; I suppose, in general, we should scale all values based on the max(dataset). So, in our case, x_train / 97.26462, but 100 is close enough I suppose.

I am confused however.

Shouldn’t the process be:

1. scale all inputs so that they are all in a equal playing field. (in our case divide all inputs / 100. This produces a balanced set of inputs that range from 0…1
2. create your model using these scaled inputs
3. unscale the outputs (in our case multiply all predictions * 100). This unscaling puts everything back to the original scale.

thanks,
Ed

You’re welcome.

I’d like to point out that both inputs and outputs should be scaled when required to a smaller range like [0,1].

Here’s more on feature scaling if you’re interested.

To iterate the steps:

1. Scale features
2. Train model
3. When making prediction, rescale inputs as per the same rules you used during training.
4. Rescale model prediction back to the original scale.

The steps you wrote is followed in the 1st assignment in the course 1:
Housing Prices

The prediction was done using the reduced scale of 50K. When predicting, you can translate this predicted output back to the original scale.

The way it’s done in time series is ok too. This is because the scale of inputs and outputs is the same.

Thanks

Would you be able to point me to another example that clearly outlines the steps:

1. Scale features (I assume all features should be scaled to [0…1]? )
2. Train model
3. When making prediction, rescale inputs as per the same rules you used during training. (I assume you mean unscale the inputs? ie., the exact opposite transformation to get back to the original scale?)
4. Rescale model prediction back to the original scale.