Help for understanding lambda layer

Hi,

I saw the video but I’m still not able to understand well the lambda layers for the LSTM model.

tf.keras.layers.Lambda(lambda x: tf.expand_dims(x, axis=-1), input_shape=[None]),
...
tf.keras.layers.Lambda(lambda x: x * 100.0)

Do you mind explaining me please ? I would be happy to get more insights about those two layers.

Thank you

Have a look at this recent post from yesterday:

Got it thank.
But on this example, why do we need to multiply at the end and expand dim at the beginning ?

Hello @lirone,

In the video on Lambda layers in week 3, Laurence Moroney says :

The first Lambda layer will be used to help us with our dimensionality.
If you recall when we wrote the window dataset helper function, it returned two-dimensional batches of Windows on the data, with the first being the batch size and the second the number of timestamps.
But an RNN expects three-dimensions: batch size, the number of timestamps, and the series dimensionality.
With the Lambda layer, we can fix this without rewriting our Window dataset helper function.
Using the Lambda, we just expand the array by one dimension.
By setting the input shape to none, we’re saying that the model can take sequences of any length.

Similarly, if we scale up the outputs by 100, we can help training.

In other words we needed the input dimension to be different from the one we get from the helper function we wrote to get the data. Thus, instead of rewriting the all helper function we can create a layer that expands the dimensionality. None of the “standard”, already present layers could do the job, then we built a Lamda layer.

The same idea for the last layer. We want to multiply the output by 100, and we can use a lambda layer to do that.

I hope it is clear (or clearer at least :sweat_smile:) now,
Best,
Davide

2 Likes

I got it ! Thank you @dtisi9 !
Do you mind explaining me gain why do we need to multiply the output by 100 ?

thank you

Hello,

The main idea is that the default activation function in the RNN is the tanh ( hyperbolic tangent Wikipedia). The tanh function has a codomain in [-1,1] range, but normally the values of the time series we want to predict are in the 10s like 40s, 50s, 60s, and 70s, multiplying by 100 should help the training. It is a way to fill the gap between the predictions that you would have from the Dense Layer,and the real data you want to approximate.

Best,
Davide

2 Likes

Hey, @dtisi9. Just to clarify. We multiply by 100 to align the output of the model with actual data, right? So that model.predict will give accurate results and loss will be adequate. So if the units of ‘y’ are 1000s, we should multiply by 1000, and so on. Did I get it right?

Yes, you got the main idea,it is not a rule just a way to maybe help the training

1 Like